Re: perf performance with libdw vs libunwind

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Guilherme Amadio <amadio@gentoo.org>
To: Ian Rogers <irogers@google.com>
Cc: acme@kernel.org, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, libunwind-devel@nongnu.org
Subject: Re: perf performance with libdw vs libunwind
Date: Mon, 27 Apr 2026 09:12:55 +0200	[thread overview]
Message-ID: <ae8Md2yPM4ynDdRn@gentoo.org> (raw)
In-Reply-To: <CAP-5=fVoC9R05rGGt=Q4XowbJ6Cr8xGVpQEsh-K4WN0myNkOJw@mail.gmail.com>

Hi Ian,

On Thu, Apr 23, 2026 at 03:28:15PM -0700, Ian Rogers wrote:
> On Thu, Apr 23, 2026 at 2:49 AM Guilherme Amadio <amadio@gentoo.org> wrote:
> >
> > Hi Ian,
> >
> > On Wed, Apr 22, 2026 at 09:21:52PM -0700, Ian Rogers wrote:
> > > Hi Guilherme,
> > >
> > > Thanks for the feedback but I'm a little confused. Your .perfconfig is
> > > set to use frame-pointer-based unwinding, so neither libunwind nor
> > > libdw should be used for unwinding. With framepointer unwinding, a
> > > sample contains an array of IPs gathered by walking the linked list of
> > > frame pointers on the stack. With --call-graph=dwarf a region of
> > > memory is copied into a sample (the stack) along with some initial
> > > register values, libdw or libunwind is then used to process this
> > > memory using the dwarf information in the ELF binary.
> >
> > Thanks for your reply and pardon my ignorance, I thought that the libraries
> > were generically used for stack unwinding, regardless of if it's fp or dwarf.
> > I should have looked a bit deeper before reporting this, but we are on the
> > right track.
> >
> > > So something that changed in v7.0 is that with the dwarf libdw or
> > > libunwind unwinding we always had all inline functions on the stack
> > > but not with frame pointers. The IP in the frame pointer array can be
> > > of an instruction within an inlined function. In v7.0 we added a patch
> > > that includes inline information for both frame pointer and LBR-based
> > > stack traces:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/machine.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf
> >
> > This is a nice development, I have been using --call-graph=dwarf to see
> > the inlined symbols, so having the ability to see inlined functions with
> > fp unwinding, which is much more lightweight in terms of space (i.e. the
> > size of the final perf.data files), is great.
> >
> > > By default we try to add inline information using libdw if that fails
> > > we try llvm, then libbfd and finally the command line addr2line tool:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/srcline.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf#n145
> > > I suspect the slow down is for doing all this addr2line work on a
> > > binary that's been stripped. The good news here is that if you can add
> > > a config option to avoid all the fallbacks, "addr2line.style=libdw".
> > > You can also disable inline information by adding "--no-inline" to
> > > your `perf report` command line.
> >
> > The binary and its direct dependent libraries, as well as most other dependencies
> > are not stripped, but it's possible that some dependency in the full chain might be.
> >
> > When I run perf report using --no-inline I indeed recover the performance I
> > had before with perf-6.19.12. However, setting addr2line.style=libdw did not
> > help much. Here is what I observe:
> >
> > $ perf config
> > call-graph.record-mode=fp
> > $ perf version
> > perf version 7.0
> > $ time perf record -g -F max -e cycles:uk -- root.exe -l -q
> > info: Using a maximum frequency rate of 63800 Hz
> > [ perf record: Woken up 18 times to write data ]
> > [ perf record: Captured and wrote 4.720 MB perf.data (16552 samples) ]
> > 1.26
> > $ time perf report -q --stdio -g none --children --no-inline --percent-limit 75
> >     92.65%     0.00%  root.exe  root.exe              [.] main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_main@@GLIBC_2.34
> >     92.64%     0.00%  root.exe  root.exe              [.] _start
> >     91.63%     0.00%  root.exe  libCore.so.6.38.04    [.] ROOT::GetROOT()
> >     89.46%     0.00%  root.exe  libRint.so.6.38.04    [.] TRint::TRint(char const*, int*, char**, void*, int, bool, bool)
> >     88.31%     0.00%  root.exe  libCore.so.6.38.04    [.] TApplication::TApplication(char const*, int*, char**, void*, int)
> >     88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] ROOT::Internal::GetROOT2()
> >     88.20%     0.00%  root.exe  libCore.so.6.38.04    [.] TROOT::InitInterpreter()
> >     76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] CreateInterpreter
> >     76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] TCling::TCling(char const*, char const*, char const* const*, void*)
> >
> > 1.70
> > Flamegraph: https://amadio.web.cern.ch/perf/perf-report-noinline.svg
> 
> Thanks for all the reporting!
> So here the C++ demangler is about 1/3rd of execution time and there's
> no dwarf decoding for the inline functions.
> 
> > Now without --no-inline, and this first command is without addr2line.style=libdw in the config:
> >
> > $ time perf report -q --stdio -g none --children --percent-limit 75
> >     92.65%     0.00%  root.exe  root.exe              [.] main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
> >     92.64%     0.00%  root.exe  root.exe              [.] _start
> >     88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] GetROOT2 (inlined)
> >     76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] CreateInterpreter
> >
> > 241.60
> > Flamegraph: https://amadio.web.cern.ch/perf/perf-report-addr2line.svg
> 
> Here perf is using libdw trying to do the addr2line and then it is
> using the addr2line command to do it. Time is mainly spent gathering
> addr2line inline information.
> 
> > $ perf config addr2line.style=libdw
> > $ perf config
> > call-graph.record-mode=fp
> > addr2line.style=libdw
> > $ time perf report -q --stdio -g none --children --percent-limit 75
> >     92.65%     0.00%  root.exe  root.exe              [.] main
> >     92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
> >     92.64%     0.00%  root.exe  root.exe              [.] _start
> >     88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] GetROOT2 (inlined)
> >
> > 137.93
> > Flamegraph: https://amadio.web.cern.ch/perf/perf-report-libdw.svg
> 
> Here time is just spent in libdw.
> 
> > The flame graphs above are for the perf-report commands themselves.
> >
> > So, the performance is fine with --no-inline, and it's better with addr2line.style=libdw.
> > However, the function names are not the best in the last two reports, so this problem remains.
> >
> > > Your report suggests we should tweak the defaults for showing inline
> > > information. Could you try the options I've suggested and see if they
> > > remedy the issue for you?
> >
> > Thank you for the suggestions. Indeed --no-inline seems to bring back the
> > previous performance. Please let me know if you would like me to try more
> > things and what other information you need for the cases without --no-inline.
> 
> Performance-wise, things are working as expected. I'm confused about
> why we see different symbol names, perhaps this points to a libdw bug.
> With or without --no-inline libelf gets the symbol name, libdw is only
> used to get the source line and inlining information. Perhaps this is
> more of a bug with `-g none`, which is an option I've never used. I'm
> quite busy at the moment, so it's not easy for me to dig into this.
> Perhaps we can create a test and try to get an LLM to investigate it.

Thank you for your explanations too. I will use --no-inline for now. The
option -g none just produces a flat profile even if you recorded with -g.

Best regards,
-Guilherme

     prev parent reply	other threads:[~2026-04-27  7:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 13:26 perf performance with libdw vs libunwind Guilherme Amadio
2026-04-23  4:21 ` Ian Rogers
2026-04-23  9:49   ` Guilherme Amadio
2026-04-23 22:28     ` Ian Rogers
2026-04-27  7:12       ` Guilherme Amadio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae8Md2yPM4ynDdRn@gentoo.org \
    --to=amadio@gentoo.org \
    --cc=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=libunwind-devel@nongnu.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.