From: Jiri Olsa <olsajiri@gmail.com>
To: William Cohen <wcohen@redhat.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: PROBLEM: The --call-graph=fp data does do not agree with the --call-graph=dwarf results
Date: Wed, 31 Aug 2022 14:17:26 +0200 [thread overview]
Message-ID: <Yw9RVlb3YxpTxROw@krava> (raw)
In-Reply-To: <cb3efad5-37eb-29e4-03b0-f0bcc8be9918@redhat.com>
On Tue, Aug 30, 2022 at 10:21:28AM -0400, William Cohen wrote:
> With a perf Fedora 36 perf-tools-5.18.13-200.fc36 I was examining
> where perf-report was spending its time when generating its report and
> found there was an efficiency issue in Fedora 36's binutils-2.37. The
> efficient issue been addressed in Fedora rawhide and will be
> backported to Fedora 36
> (https://bugzilla.redhat.com/show_bug.cgi?id=2120752). This was
nice :)
> initially discovered when processing perf.data files created with
> --call-graph=dwarf. The output of the perf-report call-graph for
> dwarf information notes inlined functions in the report. The excessive
> time spent in binutils bfd's lookup_func_by_offset was caused by perf-report
> building up a red-black tree mapping IP addresses to functions
> including inlined functions.
>
> I ran a similar experiment with --call-graph=fp to see if it triggered
> the same execessive overhead in building the red-black tree for
> inlined functions. It did not. The resulting output of the perf-report
> for --call-graph=fp does not include information about inlined functions.
>
> I have a small reproducer in the attached perf_inlined.tar.gz that
> demonstrates the difference between the two methods of storing
> call-chain information. Compile and collect data with:
>
> tar xvfz perf_inlined.tar.gz
> cd perf_inlined
> make all
> perf report --input=perf_fp.data > fp.log
> perf report --input=perf_dwarf.data > dwarf.log
>
> The dwarf.log has the expected call structure for main:
>
>
> main
> |
> --85.72%--fill_array (inlined)
> |
> |--78.09%--rand
> | |
> | --75.10%--__random
> | |
> | --9.14%--__random_r
> |
> |--1.58%--compute_sqrt (inlined)
> |
> --1.32%--_init
>
> The fp.log looks odd given program:
what's odd about that? perhaps the confusion is that you are
in children mode? you could try --no-children
>
> 99.99% 0.00% time_waste libc.so.6 [.] __libc_start_call_main
> |
> ---__libc_start_call_main
> |
> |--66.07%--__random
> |
> |--21.28%--main
> |
> |--8.42%--__random_r
> |
> |--2.91%--rand
> |
> --1.31%--_init
>
> Given how common that functions are inlined in optimized code it seems
> like perf-report of --call-graph=fp should include information about
> time spent in inlined functions.
hum, so 'fp' call graph is tracersing frame pointers which I would
not expect for inlined functions
jirka
next prev parent reply other threads:[~2022-08-31 12:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-30 14:21 PROBLEM: The --call-graph=fp data does do not agree with the --call-graph=dwarf results William Cohen
2022-08-31 12:17 ` Jiri Olsa [this message]
2022-08-31 13:47 ` William Cohen
2022-08-31 14:22 ` Milian Wolff
2022-08-31 16:15 ` Arnaldo Carvalho de Melo
2022-09-08 19:12 ` William Cohen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yw9RVlb3YxpTxROw@krava \
--to=olsajiri@gmail.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=wcohen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.