* frame-pointer based user stack unwinding with perf on arm32
@ 2016-11-14 13:14 Milian Wolff
2016-11-14 23:45 ` Kim Phillips
0 siblings, 1 reply; 2+ messages in thread
From: Milian Wolff @ 2016-11-14 13:14 UTC (permalink / raw)
To: perf group
[-- Attachment #1: Type: text/plain, Size: 2251 bytes --]
Hey all,
in principle, from what I understand and read on various places on the
internet, it should be possible to unwind user stacks with perf using frame
pointers on arm32 platform. See e.g.:
http://lxr.free-electrons.com/source/arch/arm/kernel/perf_callchain.c?
v=4.1#L62
Today, I tried this again, and could not make it work.
I used the stress_bt test application [1], and compiled it with various
combinations of
-fno-omit-frame-pointer
-mapcs-frame
-mtpcs-frame
-funwind-tables
-fasynchronous-unwind-tables
[1]: https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding#Backtrace_stress_application
None of these produced the desired results when running `perf record -g` on
the target platform (a panda board):
root@arm:~# perf record -g ./stress_bt
Total count: 171711327751528502
root@arm:~# perf script
<snip>
...
stress_bt 825 7645.3346298627: 8241360 cycles:ppp:
5a0 foo_128+0xfffe0084 (/root/stress_bt)
stress_bt 825 7645.3346305738: 7932022 cycles:ppp:
592 foo_128+0xfffe0076 (/root/stress_bt)
...
root@arm:~# uname -a
Linux arm 4.1.30-armv7-x7 #1 SMP Thu Aug 11 17:44:31 CEST 2016 armv7l GNU/
Linux
root@arm:~# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 698.80
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
processor : 1
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 698.80
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
Hardware : Generic OMAP4 (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000
Doing the same on an aarch64 platform, frame pointers seem to work as
intended.
So, can someone please clarify whether this should also work on arm32? What
are the requirements?
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: frame-pointer based user stack unwinding with perf on arm32
2016-11-14 13:14 frame-pointer based user stack unwinding with perf on arm32 Milian Wolff
@ 2016-11-14 23:45 ` Kim Phillips
0 siblings, 0 replies; 2+ messages in thread
From: Kim Phillips @ 2016-11-14 23:45 UTC (permalink / raw)
To: Milian Wolff; +Cc: perf group
On Mon, 14 Nov 2016 14:14:30 +0100
Milian Wolff <milian.wolff@kdab.com> wrote:
Hi Milian,
> None of these produced the desired results when running `perf record -g` on
> the target platform (a panda board):
>
> root@arm:~# perf record -g ./stress_bt
> Total count: 171711327751528502
> root@arm:~# perf script
> <snip>
> ...
> stress_bt 825 7645.3346298627: 8241360 cycles:ppp:
> 5a0 foo_128+0xfffe0084 (/root/stress_bt)
>
> stress_bt 825 7645.3346305738: 7932022 cycles:ppp:
> 592 foo_128+0xfffe0076 (/root/stress_bt)
> ...
> So, can someone please clarify whether this should also work on arm32? What
> are the requirements?
I have this working with a natively-built perf (today's acme's
perf/core branch):
$ ./perf --version
perf version 4.9.rc1.g699c
$ uname -a
Linux tc2 4.8.0+ #7 SMP Tue Oct 4 10:29:55 CDT 2016 armv7l GNU/Linux
$ cat ./runcallg.sh
sudo ./perf record -o perf.data --call-graph dwarf -- ./stress_bt |& tee record-callg.log
sudo ./perf report --call-graph --stdio >& report-callg.log
$ ./runcallg.sh
Lowering default frequency rate to 1600.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
Total count: 171711327751528502
[ perf record: Woken up 514 times to write data ]
Warning:
Processed 19294 events and lost 32 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 128.420 MB perf.data (16025 samples) ]
$ head -40 report-callg.log
Warning:
Processed 19294 events and lost 32 chunks!
Check IO/CPU overload!
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 16K of event 'cycles:ppp'
# Event count (approx.): 7863498267
#
# Children Self Command Shared Object Symbol
# ........ ........ ......... ................. ..................................
#
99.62% 99.58% stress_bt stress_bt [.] foo_128
|
|--95.74%--__libc_start_main
| main
| doit
| bar
| |
| |--1.01%--foo_30
| | foo_31
| | foo_32
| | foo_33
| | foo_34
| | foo_35
| | foo_36
| | foo_37
| | foo_38
| | foo_39
| | foo_40
| | foo_41
| | foo_42
| | foo_43
| | foo_44
| | foo_45
| | foo_46
The above works both with the arm32 binary included in the downloaded
stress_bt.tar.gz, and one built with a native gcc 4.9.2, using only the
'-g' flag ("gcc -g stress_bt.c").
OTOH, I tried using a cross-built perf, and it did not work (same
behaviour you're seeing).
The Linaro wiki page lists at least libunwind as a dependency, and the
native build has it:
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... gtk2: [ OFF ]
... libaudit: [ on ]
... libbfd: [ on ]
... libelf: [ on ]
... libnuma: [ OFF ]
... numa_num_possible_cpus: [ OFF ]
... libperl: [ on ]
... libpython: [ on ]
... libslang: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ OFF ]
... bpf: [ OFF ]
Makefile.config:349: BPF prologue is not supported by architecture arm, missing regs_query_register_offset()
Makefile.config:422: BPF API too old. Please install recent kernel headers. BPF support in 'perf record' is disabled.
Makefile.config:519: GTK2 not found, disables GTK2 support. Please install gtk2-devel or libgtk2.0-dev
Makefile.config:693: No numa.h found, disables 'perf bench numa mem' benchmark, please install numactl-devel/libnuma-devel/libnuma-dev
whereas the cross build does not:
Auto-detecting system features:
... dwarf: [ OFF ]
... dwarf_getlocations: [ OFF ]
... glibc: [ on ]
... gtk2: [ OFF ]
... libaudit: [ OFF ]
... libbfd: [ OFF ]
... libelf: [ OFF ]
... libnuma: [ OFF ]
... numa_num_possible_cpus: [ OFF ]
... libperl: [ OFF ]
... libpython: [ OFF ]
... libslang: [ OFF ]
... libcrypto: [ OFF ]
... libunwind: [ OFF ]
... libdw-dwarf-unwind: [ OFF ]
... zlib: [ OFF ]
... lzma: [ OFF ]
... get_cpuid: [ OFF ]
... bpf: [ on ]
Makefile.config:260: No libelf found, disables 'probe' tool and BPF support in 'perf record', please install libelf-dev, l
ibelf-devel or elfutils-libelf-devel
Makefile.config:360: No sys/sdt.h found, no SDT events are defined, please install systemtap-sdt-devel or systemtap-sdt-de
v
Makefile.config:433: Disabling post unwind, no support found.
Makefile.config:479: No libaudit.h found, disables 'trace' tool, please install audit-libs-devel or libaudit-dev
Makefile.config:490: No libcrypto.h found, disables jitted code injection, please install libssl-devel or libssl-dev
Makefile.config:505: slang not found, disables TUI support. Please install slang-devel, libslang-dev or libslang2-dev
Makefile.config:519: GTK2 not found, disables GTK2 support. Please install gtk2-devel or libgtk2.0-dev
Makefile.config:547: Missing perl devel files. Disabling perl scripting support, please install perl-ExtUtils-Embed/libper
l-dev
Makefile.config:590: No 'Python.h' (for Python 2.x support) was found: disables Python support - please install python-dev
el/python-dev
Makefile.config:680: No liblzma found, disables xz kernel module decompression, please install xz-devel/liblzma-dev
Makefile.config:693: No numa.h found, disables 'perf bench numa mem' benchmark, please install numactl-devel/libnuma-devel
/libnuma-dev
Unfortunately, I don't know how to cross-build perf with libunwind
turned on: On Ubuntu, I cd tools/ and issue 'make ARCH=arm
CROSS_COMPILE=arm-linux-gnueabihf- perf'. Installing something called
android-libunwind-dev didn't help, and I can't tell whether the wiki
page includes building perf in a cross environment (in fact, it
references a /lib/arm-linux-gnueabihf/ which is present on my versatile
express' target Debian installation).
hth,
Kim
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-11-14 23:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14 13:14 frame-pointer based user stack unwinding with perf on arm32 Milian Wolff
2016-11-14 23:45 ` Kim Phillips
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).