Re: perf performance with libdw vs libunwind

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Guilherme Amadio <amadio@gentoo.org>
To: Ian Rogers <irogers@google.com>
Cc: acme@kernel.org, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, libunwind-devel@nongnu.org
Subject: Re: perf performance with libdw vs libunwind
Date: Thu, 23 Apr 2026 11:49:16 +0200	[thread overview]
Message-ID: <aenrHEOZDiBKG4Sy@gentoo.org> (raw)
In-Reply-To: <CAP-5=fX6PT2FsvAoy+0AGq3AFEQ9XU_HBhGkZdX0fn0Ltss-JA@mail.gmail.com>

Hi Ian,

On Wed, Apr 22, 2026 at 09:21:52PM -0700, Ian Rogers wrote:
> Hi Guilherme,
> 
> Thanks for the feedback but I'm a little confused. Your .perfconfig is
> set to use frame-pointer-based unwinding, so neither libunwind nor
> libdw should be used for unwinding. With framepointer unwinding, a
> sample contains an array of IPs gathered by walking the linked list of
> frame pointers on the stack. With --call-graph=dwarf a region of
> memory is copied into a sample (the stack) along with some initial
> register values, libdw or libunwind is then used to process this
> memory using the dwarf information in the ELF binary.

Thanks for your reply and pardon my ignorance, I thought that the libraries
were generically used for stack unwinding, regardless of if it's fp or dwarf.
I should have looked a bit deeper before reporting this, but we are on the
right track.

> So something that changed in v7.0 is that with the dwarf libdw or
> libunwind unwinding we always had all inline functions on the stack
> but not with frame pointers. The IP in the frame pointer array can be
> of an instruction within an inlined function. In v7.0 we added a patch
> that includes inline information for both frame pointer and LBR-based
> stack traces:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/machine.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf

This is a nice development, I have been using --call-graph=dwarf to see
the inlined symbols, so having the ability to see inlined functions with
fp unwinding, which is much more lightweight in terms of space (i.e. the
size of the final perf.data files), is great.

> By default we try to add inline information using libdw if that fails
> we try llvm, then libbfd and finally the command line addr2line tool:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/srcline.c?h=perf-tools-next&id=28cb835f7645892f4559b92fcfeb25a81646f4cf#n145
> I suspect the slow down is for doing all this addr2line work on a
> binary that's been stripped. The good news here is that if you can add
> a config option to avoid all the fallbacks, "addr2line.style=libdw".
> You can also disable inline information by adding "--no-inline" to
> your `perf report` command line.

The binary and its direct dependent libraries, as well as most other dependencies
are not stripped, but it's possible that some dependency in the full chain might be.

When I run perf report using --no-inline I indeed recover the performance I
had before with perf-6.19.12. However, setting addr2line.style=libdw did not
help much. Here is what I observe:

$ perf config
call-graph.record-mode=fp
$ perf version
perf version 7.0
$ time perf record -g -F max -e cycles:uk -- root.exe -l -q
info: Using a maximum frequency rate of 63800 Hz
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 4.720 MB perf.data (16552 samples) ]
1.26
$ time perf report -q --stdio -g none --children --no-inline --percent-limit 75
    92.65%     0.00%  root.exe  root.exe              [.] main
    92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
    92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_main@@GLIBC_2.34
    92.64%     0.00%  root.exe  root.exe              [.] _start
    91.63%     0.00%  root.exe  libCore.so.6.38.04    [.] ROOT::GetROOT()
    89.46%     0.00%  root.exe  libRint.so.6.38.04    [.] TRint::TRint(char const*, int*, char**, void*, int, bool, bool)
    88.31%     0.00%  root.exe  libCore.so.6.38.04    [.] TApplication::TApplication(char const*, int*, char**, void*, int)
    88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] ROOT::Internal::GetROOT2()
    88.20%     0.00%  root.exe  libCore.so.6.38.04    [.] TROOT::InitInterpreter()
    76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] CreateInterpreter
    76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] TCling::TCling(char const*, char const*, char const* const*, void*)

1.70
Flamegraph: https://amadio.web.cern.ch/perf/perf-report-noinline.svg

Now without --no-inline, and this first command is without addr2line.style=libdw in the config:

$ time perf report -q --stdio -g none --children --percent-limit 75
    92.65%     0.00%  root.exe  root.exe              [.] main
    92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
    92.64%     0.00%  root.exe  root.exe              [.] _start
    88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] GetROOT2 (inlined)
    76.34%     0.00%  root.exe  libCling.so.6.38.04   [.] CreateInterpreter

241.60
Flamegraph: https://amadio.web.cern.ch/perf/perf-report-addr2line.svg

$ perf config addr2line.style=libdw
$ perf config
call-graph.record-mode=fp
addr2line.style=libdw
$ time perf report -q --stdio -g none --children --percent-limit 75
    92.65%     0.00%  root.exe  root.exe              [.] main
    92.64%     0.00%  root.exe  libc.so.6             [.] __libc_start_call_main
    92.64%     0.00%  root.exe  root.exe              [.] _start
    88.22%     0.00%  root.exe  libCore.so.6.38.04    [.] GetROOT2 (inlined)

137.93
Flamegraph: https://amadio.web.cern.ch/perf/perf-report-libdw.svg

The flame graphs above are for the perf-report commands themselves.

So, the performance is fine with --no-inline, and it's better with addr2line.style=libdw.
However, the function names are not the best in the last two reports, so this problem remains.

> Your report suggests we should tweak the defaults for showing inline
> information. Could you try the options I've suggested and see if they
> remedy the issue for you?

Thank you for the suggestions. Indeed --no-inline seems to bring back the
previous performance. Please let me know if you would like me to try more
things and what other information you need for the cases without --no-inline.

Best regards,
-Guilherme

next prev parent reply	other threads:[~2026-04-23  9:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 13:26 perf performance with libdw vs libunwind Guilherme Amadio
2026-04-23  4:21 ` Ian Rogers
2026-04-23  9:49   ` Guilherme Amadio [this message]
2026-04-23 22:28     ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aenrHEOZDiBKG4Sy@gentoo.org \
    --to=amadio@gentoo.org \
    --cc=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=libunwind-devel@nongnu.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox