From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A63A3A9DB1; Tue, 20 Jan 2026 20:18:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768940331; cv=none; b=TEP/MoxyU2dXVj0yZGX+QZJI10ZmZOwhdr99INSfGQ/Jzx++aKEdYst35kQBAELX2UnopWvNvAwJavpkIvkedM5gTDS5etgiuTw5LHXmD1z38n823O+TSHiSLSJpMMgzl7xhgLp5YlSKx/b20LnXtF6ImGsodBDnq9ZiLaBgBdA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768940331; c=relaxed/simple; bh=PU/7S/vc+zeqZS0B2GmmDAW6BOqWzuE8ycGgTfUkAek=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=EM36G17wYK3jXjwyovblaPgPH+V2pT7YI1RymIgnQOhmOFPyv47s9fCFZARQp0tlGt/1WCvdUrSK/huhMGuEWke43PEpMnmjX8G0mCJER4o3r7S39Qet3rMWg3qEzuVUAsE80S7/GrJBCeil7Dlg3n8mv+VzpI1MMxpHgEngA3U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OS7xkzLV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OS7xkzLV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D64FC16AAE; Tue, 20 Jan 2026 20:18:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768940331; bh=PU/7S/vc+zeqZS0B2GmmDAW6BOqWzuE8ycGgTfUkAek=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OS7xkzLVNidImwtFfFYAMzZQ9M4Rh4QemGlbxlwBmKR32ce0zTZ8b5+4H2a/jdW27 wucwgUq1bW9twN+2jsEszzXQdzdSL0663kFXeYrKn6zbtInzplOSN44uk74Pm0+ESo hQnxCWT22D2VT1W7ReEl0/NcKCp3If+Ai27rAS6Gw3g3+xRWPmer3RdEqGBt55a2fR pQIKnRhjhVoKImbc6guxUe18NnaRGzzYW/mbMwWOMsQkbmDTBX86dxhREOWZd+VoCX YUgjP/mQ2uGwE5sBi+YjxU8Ev/ehovPKP6jiA/t8xNhrMihEeDYDl4SjkdMsn4IJBM dXMjSPhOIMdXg== Date: Tue, 20 Jan 2026 17:18:48 -0300 From: Arnaldo Carvalho de Melo To: Namhyung Kim Cc: Ian Rogers , James Clark , Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org Subject: Re: [PATCH v3 1/2] perf inject: Add --convert-callchain option Message-ID: References: <20260113232903.55021-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260113232903.55021-1-namhyung@kernel.org> On Tue, Jan 13, 2026 at 03:29:02PM -0800, Namhyung Kim wrote: > There are applications not built with frame pointers, so DWARF is needed > to get the stack traces. So `perf record --call-graph dwarf` saves the > stack and register data for each sample to get the stacktrace offline. > But sometimes those data may have sensitive information and we don't > want to keep them in the file. > > This perf inject --convert-callchain option parses the callchains and > discard the stack and register after that. This will save storage space > and processing time for the new data file. Of course, users should > remove the original data file. :) > > The down side is that it cannot handle inlined callchain entries as they > all have the same IPs. Maybe we can add an option to perf report to > look up inlined functions using DWARF - IIUC it won't requires stack and > register data. Thanks, applied to perf-tools-next, - Arnaldo > This is an example. > > $ perf record --call-graph dwarf -- perf test -w noploop > > $ perf report --stdio --no-children --percent-limit=0 > output-prev > > $ perf inject -i perf.data --convert-callchain -o perf.data.out > > $ perf report --stdio --no-children --percent-limit=0 -i perf.data.out > output-next > > $ diff -u output-prev output-next > ... > 0.23% perf ld-linux-x86-64.so.2 [.] _dl_relocate_object_no_relro > | > - ---elf_dynamic_do_Rela (inlined) > - _dl_relocate_object_no_relro > + ---_dl_relocate_object_no_relro > _dl_relocate_object > dl_main > _dl_sysdep_start > - _dl_start_final (inlined) > _dl_start > _start > > Reviewed-by: Ian Rogers > Signed-off-by: Namhyung Kim > --- > v3 changes) > * add Ian's reviewed-by tag > * handle error cases properly (Arnaldo) > > tools/perf/Documentation/perf-inject.txt | 5 + > tools/perf/builtin-inject.c | 152 +++++++++++++++++++++++ > 2 files changed, 157 insertions(+) > > diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt > index c972032f4ca0d248..95dfdf39666efe89 100644 > --- a/tools/perf/Documentation/perf-inject.txt > +++ b/tools/perf/Documentation/perf-inject.txt > @@ -109,6 +109,11 @@ include::itrace.txt[] > should be used, and also --buildid-all and --switch-events may be > useful. > > +--convert-callchain:: > + Parse DWARF callchains and convert them to usual callchains. This also > + discards stack and register data from the samples. This will lose > + inlined callchain entries. > + > :GMEXAMPLECMD: inject > :GMEXAMPLESUBCMD: > include::guestmount.txt[] > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > index 6080afec537d2178..e2a653280e1b8361 100644 > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c > @@ -122,6 +122,7 @@ struct perf_inject { > bool in_place_update; > bool in_place_update_dry_run; > bool copy_kcore_dir; > + bool convert_callchain; > const char *input_name; > struct perf_data output; > u64 bytes_written; > @@ -133,6 +134,7 @@ struct perf_inject { > struct guest_session guest_session; > struct strlist *known_build_ids; > const struct evsel *mmap_evsel; > + struct ip_callchain *raw_callchain; > }; > > struct event_entry { > @@ -383,6 +385,90 @@ static int perf_event__repipe_sample(const struct perf_tool *tool, > return perf_event__repipe_synth(tool, event); > } > > +static int perf_event__convert_sample_callchain(const struct perf_tool *tool, > + union perf_event *event, > + struct perf_sample *sample, > + struct evsel *evsel, > + struct machine *machine) > +{ > + struct perf_inject *inject = container_of(tool, struct perf_inject, tool); > + struct callchain_cursor *cursor = get_tls_callchain_cursor(); > + union perf_event *event_copy = (void *)inject->event_copy; > + struct callchain_cursor_node *node; > + struct thread *thread; > + u64 sample_type = evsel->core.attr.sample_type; > + u32 sample_size = event->header.size; > + u64 i, k; > + int ret; > + > + if (event_copy == NULL) { > + inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE); > + if (!inject->event_copy) > + return -ENOMEM; > + > + event_copy = (void *)inject->event_copy; > + } > + > + if (cursor == NULL) > + return -ENOMEM; > + > + callchain_cursor_reset(cursor); > + > + thread = machine__find_thread(machine, sample->tid, sample->pid); > + if (thread == NULL) > + goto out; > + > + /* this will parse DWARF using stack and register data */ > + ret = thread__resolve_callchain(thread, cursor, evsel, sample, > + /*parent=*/NULL, /*root_al=*/NULL, > + PERF_MAX_STACK_DEPTH); > + thread__put(thread); > + if (ret != 0) > + goto out; > + > + /* copy kernel callchain and context entries */ > + for (i = 0; i < sample->callchain->nr; i++) { > + inject->raw_callchain->ips[i] = sample->callchain->ips[i]; > + if (sample->callchain->ips[i] == PERF_CONTEXT_USER) { > + i++; > + break; > + } > + } > + if (i == 0 || inject->raw_callchain->ips[i - 1] != PERF_CONTEXT_USER) > + inject->raw_callchain->ips[i++] = PERF_CONTEXT_USER; > + > + node = cursor->first; > + for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) { > + if (machine__kernel_ip(machine, node->ip)) > + /* kernel IPs were added already */; > + else if (node->ms.sym && node->ms.sym->inlined) > + /* we can't handle inlined callchains */; > + else > + inject->raw_callchain->ips[i++] = node->ip; > + > + node = node->next; > + } > + > + inject->raw_callchain->nr = i; > + sample->callchain = inject->raw_callchain; > + > +out: > + memcpy(event_copy, event, sizeof(event->header)); > + > + /* adjust sample size for stack and regs */ > + sample_size -= sample->user_stack.size; > + sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64); > + sample_size += (sample->callchain->nr + 1) * sizeof(u64); > + event_copy->header.size = sample_size; > + > + /* remove sample_type {STACK,REGS}_USER for synthesize */ > + sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); > + > + perf_event__synthesize_sample(event_copy, sample_type, > + evsel->core.attr.read_format, sample); > + return perf_event__repipe_synth(tool, event_copy); > +} > + > static struct dso *findnew_dso(int pid, int tid, const char *filename, > const struct dso_id *id, struct machine *machine) > { > @@ -2270,6 +2356,15 @@ static int __cmd_inject(struct perf_inject *inject) > /* Allow space in the header for guest attributes */ > output_data_offset += gs->session->header.data_offset; > output_data_offset = roundup(output_data_offset, 4096); > + } else if (inject->convert_callchain) { > + inject->tool.sample = perf_event__convert_sample_callchain; > + inject->tool.fork = perf_event__repipe_fork; > + inject->tool.comm = perf_event__repipe_comm; > + inject->tool.exit = perf_event__repipe_exit; > + inject->tool.mmap = perf_event__repipe_mmap; > + inject->tool.mmap2 = perf_event__repipe_mmap2; > + inject->tool.ordered_events = true; > + inject->tool.ordering_requires_timestamps = true; > } > > if (!inject->itrace_synth_opts.set) > @@ -2322,6 +2417,23 @@ static int __cmd_inject(struct perf_inject *inject) > perf_header__set_feat(&session->header, > HEADER_BRANCH_STACK); > } > + > + /* > + * The converted data file won't have stack and registers. > + * Update the perf_event_attr to remove them before writing. > + */ > + if (inject->convert_callchain) { > + struct evsel *evsel; > + > + evlist__for_each_entry(session->evlist, evsel) { > + evsel__reset_sample_bit(evsel, REGS_USER); > + evsel__reset_sample_bit(evsel, STACK_USER); > + evsel->core.attr.sample_regs_user = 0; > + evsel->core.attr.sample_stack_user = 0; > + evsel->core.attr.exclude_callchain_user = 0; > + } > + } > + > session->header.data_offset = output_data_offset; > session->header.data_size = inject->bytes_written; > perf_session__inject_header(session, session->evlist, fd, &inj_fc.fc, > @@ -2346,6 +2458,18 @@ static int __cmd_inject(struct perf_inject *inject) > return ret; > } > > +static bool evsel__has_dwarf_callchain(struct evsel *evsel) > +{ > + struct perf_event_attr *attr = &evsel->core.attr; > + const u64 dwarf_callchain_flags = > + PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER | PERF_SAMPLE_CALLCHAIN; > + > + if (!attr->exclude_callchain_user) > + return false; > + > + return (attr->sample_type & dwarf_callchain_flags) == dwarf_callchain_flags; > +} > + > int cmd_inject(int argc, const char **argv) > { > struct perf_inject inject = { > @@ -2414,6 +2538,8 @@ int cmd_inject(int argc, const char **argv) > OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory", > "guest mount directory under which every guest os" > " instance has a subdir"), > + OPT_BOOLEAN(0, "convert-callchain", &inject.convert_callchain, > + "Generate callchains using DWARF and drop register/stack data"), > OPT_END() > }; > const char * const inject_usage[] = { > @@ -2429,6 +2555,9 @@ int cmd_inject(int argc, const char **argv) > > #ifndef HAVE_JITDUMP > set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true); > +#endif > +#ifndef HAVE_LIBDW_SUPPORT > + set_option_nobuild(options, 0, "convert-callchain", "NO_LIBDW=1", true); > #endif > argc = parse_options(argc, argv, options, inject_usage, 0); > > @@ -2588,6 +2717,28 @@ int cmd_inject(int argc, const char **argv) > } > } > > + if (inject.convert_callchain) { > + struct evsel *evsel; > + > + if (inject.output.is_pipe || inject.session->data->is_pipe) { > + pr_err("--convert-callchain cannot work with pipe\n"); > + goto out_delete; > + } > + > + evlist__for_each_entry(inject.session->evlist, evsel) { > + if (!evsel__has_dwarf_callchain(evsel)) { > + pr_err("--convert-callchain requires DWARF call graph.\n"); > + goto out_delete; > + } > + } > + > + inject.raw_callchain = calloc(PERF_MAX_STACK_DEPTH, sizeof(u64)); > + if (inject.raw_callchain == NULL) { > + pr_err("callchain allocation failed\n"); > + goto out_delete; > + } > + } > + > #ifdef HAVE_JITDUMP > if (inject.jit_mode) { > inject.tool.mmap2 = perf_event__repipe_mmap2; > @@ -2618,5 +2769,6 @@ int cmd_inject(int argc, const char **argv) > free(inject.itrace_synth_opts.vm_tm_corr_args); > free(inject.event_copy); > free(inject.guest_session.ev.event_buf); > + free(inject.raw_callchain); > return ret; > } > -- > 2.52.0.457.g6b5491de43-goog