From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: linux-perf-users@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
James Clark <james.clark@linaro.org>
Subject: Re: Question: perf report & top memory usage
Date: Thu, 05 Mar 2026 10:02:01 -0800 [thread overview]
Message-ID: <87qzpym9ye.fsf@oracle.com> (raw)
In-Reply-To: <aZUbSLeRCSdTp55m@google.com>
Namhyung Kim <namhyung@kernel.org> writes:
> Hello,
>
> On Tue, Feb 17, 2026 at 11:08:19AM -0800, Stephen Brennan wrote:
>> Hello all,
>>
>> I had an interesting case where perf record required 35 GiB of memory to create
>> a report for a 400 MiB data file. Unfortunately I don't believe I can share the
>> perf.data, but I did an analysis and wanted to share what I found and ask some
>> questions.
>>
>> The particular data file contains 1,087,091 samples, with call chains, generated
>> by a pretty standard "perf record -a -g sleep 10" on a machine with 76 CPUs. I
>> looked at the perf report code and profiled memory allocations. Three items
>> seemed to dominate memory use:
>>
>> 1. Histogram columns. The default being "comm,dso,symbol". The more buckets that
>> the data is broken into, the more memory is used, and the histogram columns
>> directly control this.
>>
>> 2. Callchains. The default is to track them when the perf.data contains them,
>> though it can be disabled with "-g none". The data structure storing call
>> chains seems pretty efficient (a prefix tree) but it looks like there is one
>> per histogram bucket. This makes sense, but it seems duplicative with #3.
>>
>> 3. Accumulating child overhead. The default is to do this, creating the
>> "Children" column in the report. The implementation walks the stack for each
>> sample, creating a histogram bucket for each stack frame (even if no samples
>> were observed actually executing in those symbols).
>>
>> My understanding is that the 35 GiB memory usage then comes from a sort of
>> combinatorial explosion. In this data file, nearly every process has a unique
>> comm with numeric identifiers embedded within (e.g. "db1234"). This means that
>> the default "comm,dso,symbol" sort will result in a large number of buckets. The
>> call stacks are reasonably deep (though not absurdly so). There are many
>> non-leaf functions in the call stacks which don't have any Self samples. Child
>> overhead accumulation creates more buckets than there are samples: around 1.3
>> million buckets, compared to 1 million samples.
>>
>> From this perspective, the memory usage makes sense to me. I understand that I
>> could tweak any combination of those knobs to ameliorate the issue. The most
>> straightforward option is to use "-s dso,symbol" because the "comm" column
>> wasn't informative for this workload. I also created a new histogram column
>> implementation (see below) that represents a command with any digits stripped,
>> so that the commands could still be grouped together, without the numeric
>> identifiers disrupting the bucketing. These solutions reduce memory used to 5.1
>> and 5.4 GiB respectively.
>>
>> My concern is that most users aren't prepared to dive into this sort of detail,
>> especially when they're likely already in the middle of an analysis of some
>> other performance issue. While they may be familiar with the call graph options
>> and event selection choices, in my experience they generally aren't aware of the
>> many options that "perf report" provides. They certainly aren't aware of these
>> memory trade-offs, especially for what seemed like an innocuous 10-second data
>> collection at the default sample rate.
>>
>> To sum up, I have the following questions:
>>
>> 1. Does my analysis make sense and seem consistent with your understanding?
>
> Yes, it does!
>
>> 2. Does anybody else deal with this sort of memory usage issue, and have
>> strategies they can share?
>
> No, but I think "-s dso,sym" should work fine. And it's the default
> sort key for `perf top` command.
Interesting, I did overlook that this is the default for perf top, thanks.
>> 3. Does the patch below for the custom column make sense to submit? I know it's
>> rather workload specific, but it could be useful for others in this situation.
>
> I think it makes sense and could be useful. Maybe it's good to group
> kworker threads together.
>
> Different options would be
>
> 1. to add an option to enable it for the existing 'comm' sort key.
> 2. to have a regex pattern or so to specify names to merge.
>
> But a separate sort key as you did seems to be fine.
Thank you, those are good options as well. I hadn't considered the regex
approach before. I'll think more about it, because I do like that
flexibility.
>>
>> Patch for the "commIgnoreDigit" column. For this workload, it reduced perf
>> report's peak RSS from 35 GiB to around 5.4 GiB, when used in place of "comm":
>
> We don't use CamelCase in the perf code base. But otherwise looks ok.
>
> Thanks,
> Namhyung
Thanks Namhyung! I'll go ahead and send this patch without camel case
for further discussion.
Stephen
>>
>> From df1452ae742d933b45c18d9dde090c11fb3cf846 Mon Sep 17 00:00:00 2001
>> From: Stephen Brennan <stephen.s.brennan@oracle.com>
>> Date: Wed, 3 Dec 2025 16:01:49 -0800
>> Subject: [PATCH 1/1] tools: perf: add commIgnoreDigit
>>
>> The "comm" column allows grouping events by the process command. It is
>> intended to group like programs, despite having different PIDs. But some
>> workloads may adjust their own command, so that a unique identifier
>> (e.g. a PID or some other numeric value) is part of the command name.
>> This destroys the utility of "comm", forcing perf to place each unique
>> process name into its own bucket, which can contribute to a
>> combinatorial explosion of memory use in perf report.
>>
>> Create a less strict version of this column, which ignores digits when
>> comparing command names. This allows "similar looking" processes to
>> again be placed in the same bucket.
>>
>> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> ---
>> tools/perf/util/hist.c | 2 +
>> tools/perf/util/hist.h | 1 +
>> tools/perf/util/sort.c | 88 +++++++++++++++++++++++++++++++++++++++++-
>> tools/perf/util/sort.h | 1 +
>> 4 files changed, 91 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
>> index ef4b569f7df46..5f691d9b0272d 100644
>> --- a/tools/perf/util/hist.c
>> +++ b/tools/perf/util/hist.c
>> @@ -110,6 +110,8 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
>> len = thread__comm_len(h->thread);
>> if (hists__new_col_len(hists, HISTC_COMM, len))
>> hists__set_col_len(hists, HISTC_THREAD, len + 8);
>> + if (hists__new_col_len(hists, HISTC_COMM_IGNORE_DIGIT, len))
>> + hists__set_col_len(hists, HISTC_THREAD, len + 8);
>>
>> if (h->ms.map) {
>> len = dso__name_len(map__dso(h->ms.map));
>> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
>> index 1d5ea632ca4e1..ae7e98bd9e46d 100644
>> --- a/tools/perf/util/hist.h
>> +++ b/tools/perf/util/hist.h
>> @@ -44,6 +44,7 @@ enum hist_column {
>> HISTC_THREAD,
>> HISTC_TGID,
>> HISTC_COMM,
>> + HISTC_COMM_IGNORE_DIGIT,
>> HISTC_CGROUP_ID,
>> HISTC_CGROUP,
>> HISTC_PARENT,
>> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
>> index f3a565b0e2307..656b5cc62a730 100644
>> --- a/tools/perf/util/sort.c
>> +++ b/tools/perf/util/sort.c
>> @@ -1,4 +1,5 @@
>> // SPDX-License-Identifier: GPL-2.0
>> +#include <ctype.h>
>> #include <errno.h>
>> #include <inttypes.h>
>> #include <regex.h>
>> @@ -265,6 +266,89 @@ struct sort_entry sort_comm = {
>> .se_width_idx = HISTC_COMM,
>> };
>>
>> +/* --sort commIgnoreDigit */
>> +
>> +static int64_t strcmp_nodigit(const char *left, const char *right)
>> +{
>> + for (;;) {
>> + while (*left && isdigit(*left)) left++;
>> + while (*right && isdigit(*right)) right++;
>> + if (*left == *right && !*left) {
>> + return 0;
>> + } else if (*left == *right) {
>> + left++;
>> + right++;
>> + } else {
>> + return (int64_t)*left - (int64_t)*right;
>> + }
>> + }
>> +}
>> +
>> +static int64_t
>> +sort__commIgnoreDigit_cmp(struct hist_entry *left, struct hist_entry *right)
>> +{
>> + return strcmp_nodigit(comm__str(right->comm), comm__str(left->comm));
>> +}
>> +
>> +static int64_t
>> +sort__commIgnoreDigit_collapse(struct hist_entry *left, struct hist_entry *right)
>> +{
>> + return strcmp_nodigit(comm__str(right->comm), comm__str(left->comm));
>> +}
>> +
>> +static int64_t
>> +sort__commIgnoreDigit_sort(struct hist_entry *left, struct hist_entry *right)
>> +{
>> + return strcmp_nodigit(comm__str(right->comm), comm__str(left->comm));
>> +}
>> +
>> +static int hist_entry__commIgnoreDigit_snprintf(struct hist_entry *he, char *bf,
>> + size_t size, unsigned int width)
>> +{
>> + int ret = 0;
>> + unsigned int print_len, printed = 0, start = 0, end = 0;
>> + bool in_digit;
>> + const char *comm = comm__str(he->comm), *print;
>> + while (printed < width && printed < size && comm[start]) {
>> + in_digit = !!isdigit(comm[start]);
>> + end = start + 1;
>> + while (comm[end] && !!isdigit(comm[end]) == in_digit) end++;
>> + if (in_digit) {
>> + print_len = 3; /* <N> */
>> + print = "<N>";
>> + } else {
>> + print_len = end - start;
>> + print = &comm[start];
>> + }
>> + print_len = min(print_len, width - printed);
>> + ret = repsep_snprintf(bf + printed, size - printed, "%-.*s",
>> + print_len, print);
>> + if (ret < 0)
>> + return ret;
>> + start = end;
>> + printed += ret;
>> + }
>> + /* Pad to width if necessary */
>> + if (printed < width && printed < size) {
>> + ret = repsep_snprintf(bf + printed, size - printed, "%-*.*s",
>> + width - printed, width - printed, "");
>> + if (ret < 0)
>> + return ret;
>> + printed += ret;
>> + }
>> + return printed;
>> +}
>> +
>> +struct sort_entry sort_commIgnoreDigit = {
>> + .se_header = "CommandIgnoreDigit",
>> + .se_cmp = sort__commIgnoreDigit_cmp,
>> + .se_collapse = sort__commIgnoreDigit_collapse,
>> + .se_sort = sort__commIgnoreDigit_sort,
>> + .se_snprintf = hist_entry__commIgnoreDigit_snprintf,
>> + .se_filter = hist_entry__thread_filter,
>> + .se_width_idx = HISTC_COMM_IGNORE_DIGIT,
>> +};
>> +
>> /* --sort dso */
>>
>> static int64_t _sort__dso_cmp(struct map *map_l, struct map *map_r)
>> @@ -2576,6 +2660,7 @@ static struct sort_dimension common_sort_dimensions[] = {
>> DIM(SORT_PID, "pid", sort_thread),
>> DIM(SORT_TGID, "tgid", sort_tgid),
>> DIM(SORT_COMM, "comm", sort_comm),
>> + DIM(SORT_COMM_IGNORE_DIGIT, "commIgnoreDigit", sort_commIgnoreDigit),
>> DIM(SORT_DSO, "dso", sort_dso),
>> DIM(SORT_SYM, "symbol", sort_sym),
>> DIM(SORT_PARENT, "parent", sort_parent),
>> @@ -3675,7 +3760,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
>> list->socket = 1;
>> } else if (sd->entry == &sort_thread) {
>> list->thread = 1;
>> - } else if (sd->entry == &sort_comm) {
>> + } else if (sd->entry == &sort_comm || sd->entry == &sort_commIgnoreDigit) {
>> list->comm = 1;
>> } else if (sd->entry == &sort_type_offset) {
>> symbol_conf.annotate_data_member = true;
>> @@ -4022,6 +4107,7 @@ static bool get_elide(int idx, FILE *output)
>> case HISTC_DSO:
>> return __get_elide(symbol_conf.dso_list, "dso", output);
>> case HISTC_COMM:
>> + case HISTC_COMM_IGNORE_DIGIT:
>> return __get_elide(symbol_conf.comm_list, "comm", output);
>> default:
>> break;
>> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
>> index d7787958e06b9..6819934b4d48a 100644
>> --- a/tools/perf/util/sort.h
>> +++ b/tools/perf/util/sort.h
>> @@ -43,6 +43,7 @@ enum sort_type {
>> /* common sort keys */
>> SORT_PID,
>> SORT_COMM,
>> + SORT_COMM_IGNORE_DIGIT,
>> SORT_DSO,
>> SORT_SYM,
>> SORT_PARENT,
>> --
>> 2.47.3
>>
prev parent reply other threads:[~2026-03-05 18:02 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 19:08 Question: perf report & top memory usage Stephen Brennan
2026-02-18 1:52 ` Namhyung Kim
2026-03-05 18:02 ` Stephen Brennan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87qzpym9ye.fsf@oracle.com \
--to=stephen.s.brennan@oracle.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox