From: Guilherme Amadio <amadio@gentoo.org>
To: Ian Rogers <irogers@google.com>
Cc: acme@kernel.org, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org, libunwind-devel@nongnu.org
Subject: Re: perf performance with libdw vs libunwind
Date: Thu, 23 Apr 2026 11:49:16 +0200 [thread overview]
Message-ID: <aenrHEOZDiBKG4Sy@gentoo.org> (raw)
In-Reply-To: <CAP-5=fX6PT2FsvAoy+0AGq3AFEQ9XU_HBhGkZdX0fn0Ltss-JA@mail.gmail.com>
Hi Ian,
On Wed, Apr 22, 2026 at 09:21:52PM -0700, Ian Rogers wrote:
> Hi Guilherme,
>
> Thanks for the feedback but I'm a little confused. Your .perfconfig is
> set to use frame-pointer-based unwinding, so neither libunwind nor
> libdw should be used for unwinding. With framepointer unwinding, a
> sample contains an array of IPs gathered by walking the linked list of
> frame pointers on the stack. With --call-graph=dwarf a region of
> memory is copied into a sample (the stack) along with some initial
> register values, libdw or libunwind is then used to process this
> memory using the dwarf information in the ELF binary.
Thanks for your reply and pardon my ignorance, I thought that the libraries
were generically used for stack unwinding, regardless of if it's fp or dwarf.
I should have looked a bit deeper before reporting this, but we are on the
right track.
> So something that changed in v7.0 is that with the dwarf libdw or
> libunwind unwinding we always had all inline functions on the stack
> but not with frame pointers. The IP in the frame pointer array can be
> of an instruction within an inlined function. In v7.0 we added a patch
> that includes inline information for both frame pointer and LBR-based
> stack traces:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/machine.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf
This is a nice development, I have been using --call-graph=dwarf to see
the inlined symbols, so having the ability to see inlined functions with
fp unwinding, which is much more lightweight in terms of space (i.e. the
size of the final perf.data files), is great.
> By default we try to add inline information using libdw if that fails
> we try llvm, then libbfd and finally the command line addr2line tool:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/srcline.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf#n145
> I suspect the slow down is for doing all this addr2line work on a
> binary that's been stripped. The good news here is that if you can add
> a config option to avoid all the fallbacks, "addr2line.style=libdw".
> You can also disable inline information by adding "--no-inline" to
> your `perf report` command line.
The binary and its direct dependent libraries, as well as most other dependencies
are not stripped, but it's possible that some dependency in the full chain might be.
When I run perf report using --no-inline I indeed recover the performance I
had before with perf-6.19.12. However, setting addr2line.style=libdw did not
help much. Here is what I observe:
$ perf config
call-graph.record-mode=fp
$ perf version
perf version 7.0
$ time perf record -g -F max -e cycles:uk -- root.exe -l -q
info: Using a maximum frequency rate of 63800 Hz
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 4.720 MB perf.data (16552 samples) ]
1.26
$ time perf report -q --stdio -g none --children --no-inline --percent-limit 75
92.65% 0.00% root.exe root.exe [.] main
92.64% 0.00% root.exe libc.so.6 [.] __libc_start_call_main
92.64% 0.00% root.exe libc.so.6 [.] __libc_start_main@@GLIBC_2.34
92.64% 0.00% root.exe root.exe [.] _start
91.63% 0.00% root.exe libCore.so.6.38.04 [.] ROOT::GetROOT()
89.46% 0.00% root.exe libRint.so.6.38.04 [.] TRint::TRint(char const*, int*, char**, void*, int, bool, bool)
88.31% 0.00% root.exe libCore.so.6.38.04 [.] TApplication::TApplication(char const*, int*, char**, void*, int)
88.22% 0.00% root.exe libCore.so.6.38.04 [.] ROOT::Internal::GetROOT2()
88.20% 0.00% root.exe libCore.so.6.38.04 [.] TROOT::InitInterpreter()
76.34% 0.00% root.exe libCling.so.6.38.04 [.] CreateInterpreter
76.34% 0.00% root.exe libCling.so.6.38.04 [.] TCling::TCling(char const*, char const*, char const* const*, void*)
1.70
Flamegraph: https://amadio.web.cern.ch/perf/perf-report-noinline.svg
Now without --no-inline, and this first command is without addr2line.style=libdw in the config:
$ time perf report -q --stdio -g none --children --percent-limit 75
92.65% 0.00% root.exe root.exe [.] main
92.64% 0.00% root.exe libc.so.6 [.] __libc_start_call_main
92.64% 0.00% root.exe root.exe [.] _start
88.22% 0.00% root.exe libCore.so.6.38.04 [.] GetROOT2 (inlined)
76.34% 0.00% root.exe libCling.so.6.38.04 [.] CreateInterpreter
241.60
Flamegraph: https://amadio.web.cern.ch/perf/perf-report-addr2line.svg
$ perf config addr2line.style=libdw
$ perf config
call-graph.record-mode=fp
addr2line.style=libdw
$ time perf report -q --stdio -g none --children --percent-limit 75
92.65% 0.00% root.exe root.exe [.] main
92.64% 0.00% root.exe libc.so.6 [.] __libc_start_call_main
92.64% 0.00% root.exe root.exe [.] _start
88.22% 0.00% root.exe libCore.so.6.38.04 [.] GetROOT2 (inlined)
137.93
Flamegraph: https://amadio.web.cern.ch/perf/perf-report-libdw.svg
The flame graphs above are for the perf-report commands themselves.
So, the performance is fine with --no-inline, and it's better with addr2line.style=libdw.
However, the function names are not the best in the last two reports, so this problem remains.
> Your report suggests we should tweak the defaults for showing inline
> information. Could you try the options I've suggested and see if they
> remedy the issue for you?
Thank you for the suggestions. Indeed --no-inline seems to bring back the
previous performance. Please let me know if you would like me to try more
things and what other information you need for the cases without --no-inline.
Best regards,
-Guilherme
next prev parent reply other threads:[~2026-04-23 9:49 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 13:26 perf performance with libdw vs libunwind Guilherme Amadio
2026-04-23 4:21 ` Ian Rogers
2026-04-23 9:49 ` Guilherme Amadio [this message]
2026-04-23 22:28 ` Ian Rogers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aenrHEOZDiBKG4Sy@gentoo.org \
--to=amadio@gentoo.org \
--cc=acme@kernel.org \
--cc=irogers@google.com \
--cc=libunwind-devel@nongnu.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox