* Improved backtraces with srcline in perf report
@ 2016-08-30 13:02 Milian Wolff
2016-08-30 13:19 ` Milian Wolff
0 siblings, 1 reply; 2+ messages in thread
From: Milian Wolff @ 2016-08-30 13:02 UTC (permalink / raw)
To: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 4945 bytes --]
Hey all,
I would like to work on improving the usability of the way backtraces are
reported by perf. See also my mail on that topic from a few weeks ago
("usability issues with inlining and backtraces").
Status quo:
~~~~~~~~~~~~~~~~
perf report --no-children -s dso,sym,srcline -g address --stdio
# Samples: 8K of event 'cycles:ppp'
# Event count (approx.): 8164367769
#
# Overhead Shared Object Symbol
Source:Line
# ........ ..........................
..................................................................
.............................................
#
7.82% lab_mandelbrot [.] main
mandelbrot.h:41
|
|--2.84%--main mandelbrot.h:41
| __libc_start_main +241
| _start +4194346
|
|--2.58%--main mandelbrot.h:41
|
--2.01%--main mandelbrot.h:41
__libc_start_main +241
_start +4194346
7.79% libgcc_s.so.1 [.] __muldc3
libgcc2.c:1945
|
|--3.93%--__muldc3 libgcc2.c:1945
| main mandelbrot.h:39
| __libc_start_main +241
| _start +4194346
|
--3.72%--__muldc3 libgcc2.c:1945
main mandelbrot.h:39
__libc_start_main +241
_start +4194346
~~~~~~~~~~~~~~~~
Note: I have no idea why unwinding failed for the second main entry of 2.58%
cost, but that is not topic of this email.
I would like to fix the following issues in descending priority. Please advise
me whether these ideas are acceptable and, more importantly, where to
implement them in the existing perf code base.
a) merge entries with equal labels
Most lines of code will contain multiple instructions. When translating them
to srcline (-g address), the addresses will get mapped to the same labels. In
the above case e.g. this distinction simply adds visual noise with no value at
all. Instead, I'd like the report to show me:
~~~~~~~~~~~~~~~~
7.82% lab_mandelbrot [.] main
mandelbrot.h:41
|
|--4.85%--main mandelbrot.h:41
| __libc_start_main +241
| _start +4194346
|
|--2.58%--main mandelbrot.h:41
7.79% libgcc_s.so.1 [.] __muldc3
libgcc2.c:1945
|
|--7.79%--__muldc3 libgcc2.c:1945
| main mandelbrot.h:39
| __libc_start_main +241
| _start +4194346
~~~~~~~~~~~~~~~~
The merging of the entries must take backtraces into account, but must not
merge based on the the ip/address of entries in the backtrace, but merge based
on the symbol/label.
Especially for more complex applications, where the same code gets called from
different functions (and we thus have different backtraces), this aggregation
of costs will greatly simplify the analysis.
b) Add inliners
The above report is highly confusing, as a lot of code gets inlined and perf
does not show the symbols of inlined functions. I.e. I'd like the report to
show me:
~~~~~~~~~~~~~~~~
7.82% lab_mandelbrot [.] drawMandelbrot
mandelbrot.h:41
|
|--4.85%--drawMandelbrot mandelbrot.h:41
| main main.cpp:55
| __libc_start_main +241
| _start +4194346
|
|--2.58%--drawMandelbrot mandelbrot.h:41
| main main.cpp:55
7.79% libgcc_s.so.1 [.] __muldc3
libgcc2.c:1945
|
|--7.79%--__muldc3 libgcc2.c:1945
| std::complex<double>::operator*=<double> complex:1326
| main mandelbrot.h:39
| __libc_start_main +241
| _start +4194346
~~~~~~~~~~~~~~~~
The information for that exists in DWARF and can already be read out in perf
using the `bfd_find_inliner_info` code in util/srcline.c:addr2line.
Could someone please tell me what refactoring is required to make this work in
perf? It seems to be non-trivial.
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Improved backtraces with srcline in perf report
2016-08-30 13:02 Improved backtraces with srcline in perf report Milian Wolff
@ 2016-08-30 13:19 ` Milian Wolff
0 siblings, 0 replies; 2+ messages in thread
From: Milian Wolff @ 2016-08-30 13:19 UTC (permalink / raw)
To: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 6660 bytes --]
Hey all,
Sorry, hit send too early. Resending here with the complete message:
I would like to work on improving the usability of the way backtraces are
reported by perf. See also my mail on that topic from a few weeks ago
("usability issues with inlining and backtraces").
Status quo:
~~~~~~~~~~~~~~~~
perf report --no-children -s dso,sym,srcline -g address --stdio
# Samples: 8K of event 'cycles:ppp'
# Event count (approx.): 8164367769
#
# Overhead Shared Object Symbol
Source:Line
# ........ ..........................
..................................................................
.............................................
#
7.82% lab_mandelbrot [.] main
mandelbrot.h:41
|
|--2.84%--main mandelbrot.h:41
| __libc_start_main +241
| _start +4194346
|
|--2.58%--main mandelbrot.h:41
|
--2.01%--main mandelbrot.h:41
__libc_start_main +241
_start +4194346
7.79% libgcc_s.so.1 [.] __muldc3
libgcc2.c:1945
|
|--3.93%--__muldc3 libgcc2.c:1945
| main mandelbrot.h:39
| __libc_start_main +241
| _start +4194346
|
--3.72%--__muldc3 libgcc2.c:1945
main mandelbrot.h:39
__libc_start_main +241
_start +4194346
~~~~~~~~~~~~~~~~
Note: I have no idea why unwinding failed for the second main entry of 2.58%
cost, but that is not topic of this email.
I would like to fix the following issues in descending priority. Please advise
me whether these ideas are acceptable and, more importantly, where to
implement them in the existing perf code base.
a) merge entries with equal labels
Most lines of code will contain multiple instructions. When translating them
to srcline (-g address), the addresses will get mapped to the same labels. In
the above case e.g. this distinction simply adds visual noise with no value at
all. Instead, I'd like the report to show me:
~~~~~~~~~~~~~~~~
7.82% lab_mandelbrot [.] main
mandelbrot.h:41
|
|--4.85%--main mandelbrot.h:41
| __libc_start_main +241
| _start +4194346
|
|--2.58%--main mandelbrot.h:41
7.79% libgcc_s.so.1 [.] __muldc3
libgcc2.c:1945
|
|--7.79%--__muldc3 libgcc2.c:1945
| main mandelbrot.h:39
| __libc_start_main +241
| _start +4194346
~~~~~~~~~~~~~~~~
The merging of the entries must take backtraces into account, but must not
merge based on the the ip/address of entries in the backtrace, but merge based
on the symbol/label.
Especially for more complex applications, where the same code gets called from
different functions (and we thus have different backtraces), this aggregation
of costs will greatly simplify the analysis.
b) Add inliners
The above report is highly confusing, as a lot of code gets inlined and perf
does not show the symbols of inlined functions. I.e. I'd like the report to
show me:
~~~~~~~~~~~~~~~~
7.82% lab_mandelbrot [.] drawMandelbrot
mandelbrot.h:41
|
|--4.85%--drawMandelbrot mandelbrot.h:41
| main main.cpp:55
| __libc_start_main +241
| _start +4194346
|
|--2.58%--drawMandelbrot mandelbrot.h:41
| main main.cpp:55
7.79% libgcc_s.so.1 [.] __muldc3
libgcc2.c:1945
|
|--7.79%--__muldc3 libgcc2.c:1945
| std::complex<double>::operator*=<double> complex:1326
| std::operator*<double> complex:389
| drawMandelbrot mandelbrot.h:39
| main main.cpp:55
| __libc_start_main +241
| _start +4194346
~~~~~~~~~~~~~~~~
The information for that exists in DWARF and can already be read out in perf
using the `bfd_find_inliner_info` code in util/srcline.c:addr2line.
Could someone please tell me what refactoring is required to make this work in
perf? It seems to be non-trivial.
c) Skip backtrace entries above main
Low priority, but for me the __libc_start_main and _start entries are just
useless noise and could be skipped. Many other tools, including GDB, do this.
d) Skip backtrace entries outside "user-functions +1"
One of the most useful features in VTune's GUI is its capability to skip
backtrace entries that lie outside the scope of an application developer
domain.
This is similar to the above, but goes further. E.g. for the above, I'd only
see
~~~~~~~~~~~~~~~~
7.79% libgcc_s.so.1 [.] std::operator*<double>
libgcc2.c:1945
|
|--7.79%-- std::operator*<double> complex:389
| drawMandelbrot mandelbrot.h:39
| main main.cpp:55
| __libc_start_main +241
| _start +4194346
~~~~~~~~~~~~~~~~
This by itself does not look like a big advantage. But in reality this paired
with the other changes above will increase the cost for this entry
considerably, as many other entries that are somewhere below complex:389 will
get aggregated into this. Doing this aggregation in my head, I see that the
cost would then lie at approximately ~24%. In a simply application I can get
to that number by looking at the top-down cost (i.e. omitting --no-children),
but for complex applications that is not an option as often the same functions
will be called from different places.
From my application developer POV, this is kind of aggregation is highly
advantageous. Most of the time one's hands are bound and one cannot fix
anything
in the implementation of functions in third party libraries (like stdlib).
What
one can do though is improving one's application code to call certain
functions
less.
Thanks for reading that far. As I said, suggestions on how to implement these
features would be highly welcome. If you think that the above are bad ideas,
please explain me why.
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-08-30 13:19 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-30 13:02 Improved backtraces with srcline in perf report Milian Wolff
2016-08-30 13:19 ` Milian Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).