* perf call stacks on 32bit ARM v7 @ 2016-09-27 21:25 Milian Wolff 2016-09-28 1:43 ` Wangnan (F) 0 siblings, 1 reply; 7+ messages in thread From: Milian Wolff @ 2016-09-27 21:25 UTC (permalink / raw) To: jean.pihet; +Cc: perf group, wangnan0 [-- Attachment #1: Type: text/plain, Size: 1517 bytes --] Hello Jean, others. Can someone please clarify the requirements for getting perf to properly unwind the call stack on 32bit ARM v7? Looking at [1], it seems that I need either (a) frame pointers, or (b) DWARF debug information. Because (a) is often not available, and (b) is too large for small flash drives on embedded - how do I use perf with split debug information files? I.e. I have tries to record on the arm board using DWARF: arm-v7$ perf record --call-graph dwarf ... Then transferred the perf.data file over to my host machine. Perf archive said that no build-id's could be found, so I'm not using that. Instead, I try to ask perf to find the split debug packes using symfs: x86-64$ perf report --symfs ... -g graph But that does not work and I'm not seeing any backtraces. Stracing the report, I don't see it even trying to access files - how can I debug this and figure out what I'm supposed to be using? Also according to [1], ARM .exidx unwind tables (c) are not supported by perf - is that still the case? If so, what is holding back support for that in perf, considering that libunwind supposedly supports unwinding using that information? Thanks [1]: https://archive.fosdem.org/2015/schedule/event/arm_perf/attachments/ slides/601/export/events/attachments/arm_perf/slides/601/ Fosdem_2015_perf_status_on_ARM_and_ARM64.pdf -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf call stacks on 32bit ARM v7 2016-09-27 21:25 perf call stacks on 32bit ARM v7 Milian Wolff @ 2016-09-28 1:43 ` Wangnan (F) [not found] ` <CAORVsuUB1AwtZxnNS-Z5pXdcDfrD=GWr43C=NMZsVq=CdqR5xA@mail.gmail.com> 0 siblings, 1 reply; 7+ messages in thread From: Wangnan (F) @ 2016-09-28 1:43 UTC (permalink / raw) To: Milian Wolff, jean.pihet; +Cc: perf group, hekuang 00206996 On 2016/9/28 5:25, Milian Wolff wrote: > Hello Jean, others. > > Can someone please clarify the requirements for getting perf to properly > unwind the call stack on 32bit ARM v7? > > Looking at [1], it seems that I need either (a) frame pointers, or (b) DWARF > debug information. Because (a) is often not available, and (b) is too large > for small flash drives on embedded - how do I use perf with split debug > information files? I.e. I have tries to record on the arm board using DWARF: > > arm-v7$ perf record --call-graph dwarf ... > > Then transferred the perf.data file over to my host machine. Perf archive said > that no build-id's could be found, so I'm not using that. Instead, I try to > ask perf to find the split debug packes using symfs: > > x86-64$ perf report --symfs ... -g graph > > But that does not work and I'm not seeing any backtraces. Stracing the report, > I don't see it even trying to access files - how can I debug this and figure > out what I'm supposed to be using? Unfortunaetly, currently perf only supports corss decoding dwarf for x86_64, x86_32 and arm64. ARM32 is not on the list. Please see: http://www.spinics.net/lists/kernel/msg2266293.html and tools/perf/util/unwind-libunwind.c I think adding ARM32 support should not be very hard on perf side. Add He Kuang to the CC list, he is the author of this patch set. Thank you. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAORVsuUB1AwtZxnNS-Z5pXdcDfrD=GWr43C=NMZsVq=CdqR5xA@mail.gmail.com>]
* Re: perf call stacks on 32bit ARM v7 [not found] ` <CAORVsuUB1AwtZxnNS-Z5pXdcDfrD=GWr43C=NMZsVq=CdqR5xA@mail.gmail.com> @ 2016-09-29 10:33 ` Milian Wolff 2016-09-30 7:32 ` Jean Pihet 0 siblings, 1 reply; 7+ messages in thread From: Milian Wolff @ 2016-09-29 10:33 UTC (permalink / raw) To: Jean Pihet; +Cc: Wangnan (F), linux-perf-users, hekuang 00206996 [-- Attachment #1: Type: text/plain, Size: 2624 bytes --] On Donnerstag, 29. September 2016 09:39:51 CEST Jean Pihet wrote: > Hi Milian, Wangnan, > > I am pretty busy this week. Let me check and come back to you next week. Sure, take your time - much appreciated! > PS: did you check the detailed instructions from the Linaro wiki pages? > These are linked from the Fosdem presentation slides. You mean this page, right: https://wiki.linaro.org/KenWerner/Sandbox/libunwind That does not mention perf at all, and only talks about libunwind. Note how your talk, and the wiki, say .exidx unwinding using libunwind works. But your slides also explicitly say that this mechanism is not supported by perf, yet. I would like to know the reason for that, or whether this is outdated information. Thanks! > Le 28 sept. 2016 3:47 AM, "Wangnan (F)" <wangnan0@huawei.com> a écrit : > > On 2016/9/28 5:25, Milian Wolff wrote: > >> Hello Jean, others. > >> > >> Can someone please clarify the requirements for getting perf to properly > >> unwind the call stack on 32bit ARM v7? > >> > >> Looking at [1], it seems that I need either (a) frame pointers, or (b) > > DWARF > > >> debug information. Because (a) is often not available, and (b) is too > > large > > >> for small flash drives on embedded - how do I use perf with split debug > >> information files? I.e. I have tries to record on the arm board using > > DWARF: > >> arm-v7$ perf record --call-graph dwarf ... > >> > >> Then transferred the perf.data file over to my host machine. Perf > > archive said > > >> that no build-id's could be found, so I'm not using that. Instead, I try > > to > > >> ask perf to find the split debug packes using symfs: > >> > >> x86-64$ perf report --symfs ... -g graph > >> > >> But that does not work and I'm not seeing any backtraces. Stracing the > > report, > > >> I don't see it even trying to access files - how can I debug this and > > figure > > >> out what I'm supposed to be using? > > > > Unfortunaetly, currently perf only supports corss decoding dwarf > > for x86_64, x86_32 and arm64. ARM32 is not on the list. > > > > Please see: > > > > http://www.spinics.net/lists/kernel/msg2266293.html > > > > and > > > > tools/perf/util/unwind-libunwind.c > > > > I think adding ARM32 support should not be very hard on perf side. > > Add He Kuang to the CC list, he is the author of this patch set. > > > > Thank you. -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf call stacks on 32bit ARM v7 2016-09-29 10:33 ` Milian Wolff @ 2016-09-30 7:32 ` Jean Pihet 2016-10-02 21:12 ` Milian Wolff 0 siblings, 1 reply; 7+ messages in thread From: Jean Pihet @ 2016-09-30 7:32 UTC (permalink / raw) To: Milian Wolff; +Cc: Wangnan (F), linux-perf-users, hekuang 00206996 Hi, On Thu, Sep 29, 2016 at 12:33 PM, Milian Wolff <milian.wolff@kdab.com> wrote: > On Donnerstag, 29. September 2016 09:39:51 CEST Jean Pihet wrote: >> Hi Milian, Wangnan, >> >> I am pretty busy this week. Let me check and come back to you next week. > > Sure, take your time - much appreciated! > >> PS: did you check the detailed instructions from the Linaro wiki pages? >> These are linked from the Fosdem presentation slides. > > You mean this page, right: > > https://wiki.linaro.org/KenWerner/Sandbox/libunwind I mean https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding where the details about installation, compilation etc. are found. In short, DWARF unwinding on ARMv7 should work out of the box. You need libunwind, the correct kernel config options and perf installed. As you are mentioning the downside is the size of the generated trace which can be a limitation on embedded systems. Some parameters can be used to control the size of the generated trace (-F). > > That does not mention perf at all, and only talks about libunwind. Note how > your talk, and the wiki, say .exidx unwinding using libunwind works. But your > slides also explicitly say that this mechanism is not supported by perf, yet. > I would like to know the reason for that, or whether this is outdated > information. I do not recall the details but .exidx is not supported on ARM, because of the compiler that does not generate the info in the binaries (you can check the ELF sections for it) or the kernel does not unwind the info. > > Thanks! I hope this helps! Regards, Jean Pihet www.newoldbits.com > >> Le 28 sept. 2016 3:47 AM, "Wangnan (F)" <wangnan0@huawei.com> a écrit : >> > On 2016/9/28 5:25, Milian Wolff wrote: >> >> Hello Jean, others. >> >> >> >> Can someone please clarify the requirements for getting perf to properly >> >> unwind the call stack on 32bit ARM v7? >> >> >> >> Looking at [1], it seems that I need either (a) frame pointers, or (b) >> >> DWARF >> >> >> debug information. Because (a) is often not available, and (b) is too >> >> large >> >> >> for small flash drives on embedded - how do I use perf with split debug >> >> information files? I.e. I have tries to record on the arm board using >> >> DWARF: >> >> arm-v7$ perf record --call-graph dwarf ... >> >> >> >> Then transferred the perf.data file over to my host machine. Perf >> >> archive said >> >> >> that no build-id's could be found, so I'm not using that. Instead, I try >> >> to >> >> >> ask perf to find the split debug packes using symfs: >> >> >> >> x86-64$ perf report --symfs ... -g graph >> >> >> >> But that does not work and I'm not seeing any backtraces. Stracing the >> >> report, >> >> >> I don't see it even trying to access files - how can I debug this and >> >> figure >> >> >> out what I'm supposed to be using? >> > >> > Unfortunaetly, currently perf only supports corss decoding dwarf >> > for x86_64, x86_32 and arm64. ARM32 is not on the list. >> > >> > Please see: >> > >> > http://www.spinics.net/lists/kernel/msg2266293.html >> > >> > and >> > >> > tools/perf/util/unwind-libunwind.c >> > >> > I think adding ARM32 support should not be very hard on perf side. >> > Add He Kuang to the CC list, he is the author of this patch set. >> > >> > Thank you. > > > -- > Milian Wolff | milian.wolff@kdab.com | Software Engineer > KDAB (Deutschland) GmbH&Co KG, a KDAB Group company > Tel: +49-30-521325470 > KDAB - The Qt Experts ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf call stacks on 32bit ARM v7 2016-09-30 7:32 ` Jean Pihet @ 2016-10-02 21:12 ` Milian Wolff 2016-10-04 8:41 ` Jean Pihet 0 siblings, 1 reply; 7+ messages in thread From: Milian Wolff @ 2016-10-02 21:12 UTC (permalink / raw) To: Jean Pihet; +Cc: Wangnan (F), linux-perf-users, hekuang 00206996 [-- Attachment #1: Type: text/plain, Size: 3180 bytes --] On Freitag, 30. September 2016 09:32:44 CEST Jean Pihet wrote: > Hi, > > On Thu, Sep 29, 2016 at 12:33 PM, Milian Wolff <milian.wolff@kdab.com> wrote: > > On Donnerstag, 29. September 2016 09:39:51 CEST Jean Pihet wrote: > >> Hi Milian, Wangnan, > >> > >> I am pretty busy this week. Let me check and come back to you next week. > > > > Sure, take your time - much appreciated! > > > >> PS: did you check the detailed instructions from the Linaro wiki pages? > >> These are linked from the Fosdem presentation slides. > > > > You mean this page, right: > > > > https://wiki.linaro.org/KenWerner/Sandbox/libunwind > > I mean > https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding > where the details about installation, compilation etc. are found. > > In short, DWARF unwinding on ARMv7 should work out of the box. You > need libunwind, the correct kernel config options and perf installed. > As you are mentioning the downside is the size of the generated trace > which can be a limitation on embedded systems. Some parameters can be > used to control the size of the generated trace (-F). No, I mean the size of the application code when debug symbols are added: tmp$ du -hs libQt5Core.so.5.7.1 63M libQt5Core.so.5.7.1 tmp$ strip libQt5Core.so.5.7.1 tmp$ du -hs libQt5Core.so.5.7.1 4.8M libQt5Core.so.5.7.1 And this is just one library involved of many in the applications I have to deal with regularly. I just had a case where the target platform had about 50MB storage space free. I resorted to connecting an USB drive to load the application code with debug symbols from. But this changes the performance behavior of the application, as the code is not loaded anymore from the often excruciatingly slow on-board flash drive. > > That does not mention perf at all, and only talks about libunwind. Note > > how > > your talk, and the wiki, say .exidx unwinding using libunwind works. But > > your slides also explicitly say that this mechanism is not supported by > > perf, yet. I would like to know the reason for that, or whether this is > > outdated information. > > I do not recall the details but .exidx is not supported on ARM, > because of the compiler that does not generate the info in the > binaries (you can check the ELF sections for it) If I compile with -funwind-tables, then I do get the .exidx section, see this link I posted earlier: https://wiki.linaro.org/KenWerner/Sandbox/libunwind > or the kernel does not unwind the info. That sounds more like the culprit - can someone confirm, that: - for frame-pointer based unwinding, perf unwinds the stacks in kernel space - whereas dwarf-based unwinding copies the stack together with the samples into the perf.data, and does the unwinding later on So one would need to either - add .exidx unwinding support into the kernel - copy whatever is required into perf.data and do the unwinding later on can someone clarify what is actually required to do that? > > Thanks! > > I hope this helps! Cheers -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf call stacks on 32bit ARM v7 2016-10-02 21:12 ` Milian Wolff @ 2016-10-04 8:41 ` Jean Pihet 2016-10-04 12:01 ` Milian Wolff 0 siblings, 1 reply; 7+ messages in thread From: Jean Pihet @ 2016-10-04 8:41 UTC (permalink / raw) To: Milian Wolff; +Cc: Wangnan (F), linux-perf-users, hekuang 00206996 On Sun, Oct 2, 2016 at 11:12 PM, Milian Wolff <milian.wolff@kdab.com> wrote: > On Freitag, 30. September 2016 09:32:44 CEST Jean Pihet wrote: >> Hi, >> >> On Thu, Sep 29, 2016 at 12:33 PM, Milian Wolff <milian.wolff@kdab.com> > wrote: >> > On Donnerstag, 29. September 2016 09:39:51 CEST Jean Pihet wrote: >> >> Hi Milian, Wangnan, >> >> >> >> I am pretty busy this week. Let me check and come back to you next week. >> > >> > Sure, take your time - much appreciated! >> > >> >> PS: did you check the detailed instructions from the Linaro wiki pages? >> >> These are linked from the Fosdem presentation slides. >> > >> > You mean this page, right: >> > >> > https://wiki.linaro.org/KenWerner/Sandbox/libunwind >> >> I mean >> https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding >> where the details about installation, compilation etc. are found. >> >> In short, DWARF unwinding on ARMv7 should work out of the box. You >> need libunwind, the correct kernel config options and perf installed. >> As you are mentioning the downside is the size of the generated trace >> which can be a limitation on embedded systems. Some parameters can be >> used to control the size of the generated trace (-F). > > No, I mean the size of the application code when debug symbols are added: > > tmp$ du -hs libQt5Core.so.5.7.1 > 63M libQt5Core.so.5.7.1 > tmp$ strip libQt5Core.so.5.7.1 > tmp$ du -hs libQt5Core.so.5.7.1 > 4.8M libQt5Core.so.5.7.1 > > And this is just one library involved of many in the applications I have to > deal with regularly. I just had a case where the target platform had about > 50MB storage space free. I resorted to connecting an USB drive to load the > application code with debug symbols from. But this changes the performance > behavior of the application, as the code is not loaded anymore from the often > excruciatingly slow on-board flash drive. That is correct. You need some space for the debug symbols and also for the trace generation (I am using the RAM for speed reasons). > >> > That does not mention perf at all, and only talks about libunwind. Note >> > how >> > your talk, and the wiki, say .exidx unwinding using libunwind works. But >> > your slides also explicitly say that this mechanism is not supported by >> > perf, yet. I would like to know the reason for that, or whether this is >> > outdated information. >> >> I do not recall the details but .exidx is not supported on ARM, >> because of the compiler that does not generate the info in the >> binaries (you can check the ELF sections for it) > > If I compile with -funwind-tables, then I do get the .exidx section, see this > link I posted earlier: > > https://wiki.linaro.org/KenWerner/Sandbox/libunwind > >> or the kernel does not unwind the info. > > That sounds more like the culprit - can someone confirm, that: > > - for frame-pointer based unwinding, perf unwinds the stacks in kernel space > - whereas dwarf-based unwinding copies the stack together with the samples > into the perf.data, and does the unwinding later on Correct! This dwarf unwinding method generates a lot of trace data. Also the real time trace decoding is not possible. > > So one would need to either > > - add .exidx unwinding support into the kernel > - copy whatever is required into perf.data and do the unwinding later on > can someone clarify what is actually required to do that? That requires a more extensive analysis. As you know I am a freelance consultant and could work on that topic, as I did for Linaro before. Regards, Jean Pihet www.newoldbits.com > >> > Thanks! >> >> I hope this helps! > > Cheers > > -- > Milian Wolff | milian.wolff@kdab.com | Software Engineer > KDAB (Deutschland) GmbH&Co KG, a KDAB Group company > Tel: +49-30-521325470 > KDAB - The Qt Experts ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf call stacks on 32bit ARM v7 2016-10-04 8:41 ` Jean Pihet @ 2016-10-04 12:01 ` Milian Wolff 0 siblings, 0 replies; 7+ messages in thread From: Milian Wolff @ 2016-10-04 12:01 UTC (permalink / raw) To: Jean Pihet; +Cc: Wangnan (F), linux-perf-users, hekuang 00206996 [-- Attachment #1: Type: text/plain, Size: 4355 bytes --] On Tuesday, October 4, 2016 10:41:25 AM CEST Jean Pihet wrote: > On Sun, Oct 2, 2016 at 11:12 PM, Milian Wolff <milian.wolff@kdab.com> wrote: > > On Freitag, 30. September 2016 09:32:44 CEST Jean Pihet wrote: > >> Hi, > >> > >> On Thu, Sep 29, 2016 at 12:33 PM, Milian Wolff <milian.wolff@kdab.com> > > > > wrote: > >> > On Donnerstag, 29. September 2016 09:39:51 CEST Jean Pihet wrote: > >> >> Hi Milian, Wangnan, > >> >> > >> >> I am pretty busy this week. Let me check and come back to you next > >> >> week. > >> > > >> > Sure, take your time - much appreciated! > >> > > >> >> PS: did you check the detailed instructions from the Linaro wiki > >> >> pages? > >> >> These are linked from the Fosdem presentation slides. > >> > > >> > You mean this page, right: > >> > > >> > https://wiki.linaro.org/KenWerner/Sandbox/libunwind > >> > >> I mean > >> https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding > >> where the details about installation, compilation etc. are found. > >> > >> In short, DWARF unwinding on ARMv7 should work out of the box. You > >> need libunwind, the correct kernel config options and perf installed. > >> As you are mentioning the downside is the size of the generated trace > >> which can be a limitation on embedded systems. Some parameters can be > >> used to control the size of the generated trace (-F). > > > > No, I mean the size of the application code when debug symbols are added: > > > > tmp$ du -hs libQt5Core.so.5.7.1 > > 63M libQt5Core.so.5.7.1 > > tmp$ strip libQt5Core.so.5.7.1 > > tmp$ du -hs libQt5Core.so.5.7.1 > > 4.8M libQt5Core.so.5.7.1 > > > > And this is just one library involved of many in the applications I have > > to > > deal with regularly. I just had a case where the target platform had about > > 50MB storage space free. I resorted to connecting an USB drive to load the > > application code with debug symbols from. But this changes the performance > > behavior of the application, as the code is not loaded anymore from the > > often excruciatingly slow on-board flash drive. > > That is correct. You need some space for the debug symbols and also > for the trace generation (I am using the RAM for speed reasons). Sure, but in principle we could record samples in DWARF mode without debug symobls on the target, and then do the unwinding on a different host that actually has the debug symbols - right? I.e. once perf actually supports this, like it does for x86_64, x86_32 and arm64, as implemented by He Kuang. > >> > That does not mention perf at all, and only talks about libunwind. Note > >> > how > >> > your talk, and the wiki, say .exidx unwinding using libunwind works. > >> > But > >> > your slides also explicitly say that this mechanism is not supported by > >> > perf, yet. I would like to know the reason for that, or whether this is > >> > outdated information. > >> > >> I do not recall the details but .exidx is not supported on ARM, > >> because of the compiler that does not generate the info in the > >> binaries (you can check the ELF sections for it) > > > > If I compile with -funwind-tables, then I do get the .exidx section, see > > this link I posted earlier: > > > > https://wiki.linaro.org/KenWerner/Sandbox/libunwind > > > >> or the kernel does not unwind the info. > > > > That sounds more like the culprit - can someone confirm, that: > > > > - for frame-pointer based unwinding, perf unwinds the stacks in kernel > > space - whereas dwarf-based unwinding copies the stack together with the > > samples into the perf.data, and does the unwinding later on > > Correct! This dwarf unwinding method generates a lot of trace data. > Also the real time trace decoding is not possible. > > > So one would need to either > > > > - add .exidx unwinding support into the kernel > > - copy whatever is required into perf.data and do the unwinding later on > > > > can someone clarify what is actually required to do that? > > That requires a more extensive analysis. > > As you know I am a freelance consultant and could work on that topic, > as I did for Linaro before. It would indeed be nice if someone could contract you to fill these gaps. Cheers -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-10-04 12:01 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-27 21:25 perf call stacks on 32bit ARM v7 Milian Wolff 2016-09-28 1:43 ` Wangnan (F) [not found] ` <CAORVsuUB1AwtZxnNS-Z5pXdcDfrD=GWr43C=NMZsVq=CdqR5xA@mail.gmail.com> 2016-09-29 10:33 ` Milian Wolff 2016-09-30 7:32 ` Jean Pihet 2016-10-02 21:12 ` Milian Wolff 2016-10-04 8:41 ` Jean Pihet 2016-10-04 12:01 ` Milian Wolff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).