linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: The --call-graph=fp data does do not agree with the --call-graph=dwarf results
@ 2022-08-30 14:21 William Cohen
  2022-08-31 12:17 ` Jiri Olsa
  2022-09-08 19:12 ` William Cohen
  0 siblings, 2 replies; 6+ messages in thread
From: William Cohen @ 2022-08-30 14:21 UTC (permalink / raw)
  To: linux-perf-users; +Cc: wcohen

[-- Attachment #1: Type: text/plain, Size: 2645 bytes --]

With a perf Fedora 36 perf-tools-5.18.13-200.fc36 I was examining
where perf-report was spending its time when generating its report and
found there was an efficiency issue in Fedora 36's binutils-2.37.  The
efficient issue been addressed in Fedora rawhide and will be
backported to Fedora 36
(https://bugzilla.redhat.com/show_bug.cgi?id=2120752).  This was
initially discovered when processing perf.data files created with
--call-graph=dwarf.  The output of the perf-report call-graph for
dwarf information notes inlined functions in the report. The excessive
time spent in binutils bfd's lookup_func_by_offset was caused by perf-report
building up a red-black tree mapping IP addresses to functions
including inlined functions.

I ran a similar experiment with --call-graph=fp to see if it triggered
the same execessive overhead in building the red-black tree for
inlined functions.  It did not.  The resulting output of the perf-report
for --call-graph=fp does not include information about inlined functions.

I have a small reproducer in the attached perf_inlined.tar.gz that
demonstrates the difference between the two methods of storing
call-chain information. Compile and collect data with:

tar xvfz perf_inlined.tar.gz
cd perf_inlined
make all
perf report --input=perf_fp.data > fp.log
perf report --input=perf_dwarf.data > dwarf.log

The dwarf.log has the expected call structure for main:


               main
               |          
                --85.72%--fill_array (inlined)
                          |          
                          |--78.09%--rand
                          |          |          
                          |           --75.10%--__random
                          |                     |          
                          |                      --9.14%--__random_r
                          |          
                          |--1.58%--compute_sqrt (inlined)
                          |          
                           --1.32%--_init

The fp.log looks odd given program:

   99.99%     0.00%  time_waste  libc.so.6             [.] __libc_start_call_main
            |
            ---__libc_start_call_main
               |          
               |--66.07%--__random
               |          
               |--21.28%--main
               |          
               |--8.42%--__random_r
               |          
               |--2.91%--rand
               |          
                --1.31%--_init

Given how common that functions are inlined in optimized code it seems
like perf-report of --call-graph=fp should include information about
time spent in inlined functions.

-Will Cohen

[-- Attachment #2: perf_inlined.tar.gz --]
[-- Type: application/gzip, Size: 649 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-08 19:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-30 14:21 PROBLEM: The --call-graph=fp data does do not agree with the --call-graph=dwarf results William Cohen
2022-08-31 12:17 ` Jiri Olsa
2022-08-31 13:47   ` William Cohen
2022-08-31 14:22   ` Milian Wolff
2022-08-31 16:15     ` Arnaldo Carvalho de Melo
2022-09-08 19:12 ` William Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).