Re: perf performance with libdw vs libunwind

public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed

From: Guilherme Amadio <amadio@gentoo.org>
To: Ian Rogers <irogers@google.com>
Cc: acme@kernel.org, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, libunwind-devel@nongnu.org
Subject: Re: perf performance with libdw vs libunwind
Date: Mon, 27 Apr 2026 09:12:55 +0200	[thread overview]
Message-ID: <ae8Md2yPM4ynDdRn@gentoo.org> (raw)
In-Reply-To: <CAP-5=fVoC9R05rGGt=Q4XowbJ6Cr8xGVpQEsh-K4WN0myNkOJw@mail.gmail.com>

Hi Ian,

On Thu, Apr 23, 2026 at 03:28:15PM -0700, Ian Rogers wrote:
> On Thu, Apr 23, 2026 at 2:49 AM Guilherme Amadio <amadio@gentoo.org> wrote:
> >
> > Hi Ian,
> >
> > On Wed, Apr 22, 2026 at 09:21:52PM -0700, Ian Rogers wrote:
> > > Hi Guilherme,
> > >
> > > Thanks for the feedback but I'm a little confused. Your .perfconfig is
> > > set to use frame-pointer-based unwinding, so neither libunwind nor
> > > libdw should be used for unwinding. With framepointer unwinding, a
> > > sample contains an array of IPs gathered by walking the linked list of
> > > frame pointers on the stack. With --call-graph=dwarf a region of
> > > memory is copied into a sample (the stack) along with some initial
> > > register values, libdw or libunwind is then used to process this
> > > memory using the dwarf information in the ELF binary.
> >
> > Thanks for your reply and pardon my ignorance, I thought that the libraries
> > were generically used for stack unwinding, regardless of if it's fp or dwarf.
> > I should have looked a bit deeper before reporting this, but we are on the
> > right track.
> >
> > > So something that changed in v7.0 is that with the dwarf libdw or
> > > libunwind unwinding we always had all inline functions on the stack
> > > but not with frame pointers. The IP in the frame pointer array can be
> > > of an instruction within an inlined function. In v7.0 we added a patch
> > > that includes inline information for both frame pointer and LBR-based
> > > stack traces:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/machine.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf
> >
> > This is a nice development, I have been using --call-graph=dwarf to see
> > the inlined symbols, so having the ability to see inlined functions with
> > fp unwinding, which is much more lightweight in terms of space (i.e. the
> > size of the final perf.data files), is great.
> >
> > > By default we try to add inline information using libdw if that fails
> > > we try llvm, then libbfd and finally the command line addr2line tool:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/srcline.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf#n145
> > > I suspect the slow down is for doing all this addr2line work on a
> > > binary that's been stripped. The good news here is that if you can add
> > > a config option to avoid all the fallbacks, "addr2line.style=libdw".
> > > You can also disable inline information by adding "--no-inline" to
> > > your `perf report` command line.
> >
> > The binary and its direct dependent libraries, as well as most other dependencies
> > are not stripped, but it's possible that some dependency in the full chain might be.
> >
> > When I run perf report using --no-inline I indeed recover the performance I
> > had before with perf-6.19.12. However, setting addr2line.style=libdw did not
> > help much. Here is what I observe:
> >
> > $ perf config
> > call-graph.record-mode=fp
> > $ perf version
> > perf version 7.0
> > $ time perf record -g -F max -e cycles:uk -- root.exe -l -q
> > info: Using a maximum frequency rate of 63800 Hz
> > [ perf record: Woken up 18 times to write data ]
> > [ perf record: Captured and wrote 4.720 MB perf.data (16552 samples) ]
> > 1.26
> > $ time perf report -q --stdio -g none --children --no-inline --percent-limit 75
> >     92.65%     0.00%  root.exe  root.exe              [.] main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_main@@GLIBC_2.34
> >     92.64%     0.00%  root.exe  root.exe              [.] _start
> >     91.63%     0.00%  root.exe  libCore.so.6.38.04    [.] ROOT::GetROOT()
> >     89.46%     0.00%  root.exe  libRint.so.6.38.04    [.] TRint::TRint(char const*, int*, char**, void*, int, bool, bool)
> >     88.31%     0.00%  root.exe  libCore.so.6.38.04    [.] TApplication::TApplication(char const*, int*, char**, void*, int)
> >     88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] ROOT::Internal::GetROOT2()
> >     88.20%     0.00%  root.exe  libCore.so.6.38.04    [.] TROOT::InitInterpreter()
> >     76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] CreateInterpreter
> >     76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] TCling::TCling(char const*, char const*, char const* const*, void*)
> >
> > 1.70
> > Flamegraph: https://amadio.web.cern.ch/perf/perf-report-noinline.svg
> 
> Thanks for all the reporting!
> So here the C++ demangler is about 1/3rd of execution time and there's
> no dwarf decoding for the inline functions.
> 
> > Now without --no-inline, and this first command is without addr2line.style=libdw in the config:
> >
> > $ time perf report -q --stdio -g none --children --percent-limit 75
> >     92.65%     0.00%  root.exe  root.exe              [.] main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
> >     92.64%     0.00%  root.exe  root.exe              [.] _start
> >     88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] GetROOT2 (inlined)
> >     76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] CreateInterpreter
> >
> > 241.60
> > Flamegraph: https://amadio.web.cern.ch/perf/perf-report-addr2line.svg
> 
> Here perf is using libdw trying to do the addr2line and then it is
> using the addr2line command to do it. Time is mainly spent gathering
> addr2line inline information.
> 
> > $ perf config addr2line.style=libdw
> > $ perf config
> > call-graph.record-mode=fp
> > addr2line.style=libdw
> > $ time perf report -q --stdio -g none --children --percent-limit 75
> >     92.65%     0.00%  root.exe  root.exe              [.] main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
> >     92.64%     0.00%  root.exe  root.exe              [.] _start
> >     88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] GetROOT2 (inlined)
> >
> > 137.93
> > Flamegraph: https://amadio.web.cern.ch/perf/perf-report-libdw.svg
> 
> Here time is just spent in libdw.
> 
> > The flame graphs above are for the perf-report commands themselves.
> >
> > So, the performance is fine with --no-inline, and it's better with addr2line.style=libdw.
> > However, the function names are not the best in the last two reports, so this problem remains.
> >
> > > Your report suggests we should tweak the defaults for showing inline
> > > information. Could you try the options I've suggested and see if they
> > > remedy the issue for you?
> >
> > Thank you for the suggestions. Indeed --no-inline seems to bring back the
> > previous performance. Please let me know if you would like me to try more
> > things and what other information you need for the cases without --no-inline.
> 
> Performance-wise, things are working as expected. I'm confused about
> why we see different symbol names, perhaps this points to a libdw bug.
> With or without --no-inline libelf gets the symbol name, libdw is only
> used to get the source line and inlining information. Perhaps this is
> more of a bug with `-g none`, which is an option I've never used. I'm
> quite busy at the moment, so it's not easy for me to dig into this.
> Perhaps we can create a test and try to get an LLM to investigate it.

Thank you for your explanations too. I will use --no-inline for now. The
option -g none just produces a flat profile even if you recorded with -g.

Best regards,
-Guilherme

     prev parent reply	other threads:[~2026-04-27  7:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 13:26 perf performance with libdw vs libunwind Guilherme Amadio
2026-04-23  4:21 ` Ian Rogers
2026-04-23  9:49   ` Guilherme Amadio
2026-04-23 22:28     ` Ian Rogers
2026-04-27  7:12       ` Guilherme Amadio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae8Md2yPM4ynDdRn@gentoo.org \
    --to=amadio@gentoo.org \
    --cc=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=libunwind-devel@nongnu.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox