From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: AKel <ankelly006@hotmail.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Using perf to generate a call stack from a kernel function to the user space caller
Date: Wed, 21 Jan 2015 09:40:21 -0300 [thread overview]
Message-ID: <20150121124021.GB3420@kernel.org> (raw)
In-Reply-To: <loom.20150121T090925-23@post.gmane.org>
Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu:
> Thanks Rick for your reply. Unfortunately I'm in the big bad corporate
> world where a simple minion such as myself is not granted sudo access.
That should not be a problem in most of the cases for profiling your own
apps.
> I've tried passing the callgraph (-g) flag to perf, however I can only
> recover the call stack within the kernel, it begins at what I suspect is
Because it will try, by default, to collect %bp (frame pointer)
callchains, which unfortunately is not usually enabled for user space
programs, that use the %bp register for other purposes.
> the kernel entry point. Ideally I could generate a complete call stack.
You can try using the DWARF CFI (Call Frame Information), by using
instead:
$ perf record --call-graph dwarf firefox
Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
callgraphs:
$ perf report --no-children # expand callgraphs, press P, press q
$ head -45 perf.hist.0
- 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
_ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
_ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
_ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
_ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv
_ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE
_ZN7mozilla5image11ScaleRunner3RunEv
_ZN8nsThread16ProcessNextEventEbPb
_Z19NS_ProcessNextEventP9nsIThreadb
_ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE
_ZN11MessageLoop3RunEv
_ZN8nsThread10ThreadFuncEPv
_pt_root
start_thread
__clone
- 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back
- __memmove_ssse3_back
- 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_
- 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE
_ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
_ZN7mozilla3net10CacheIndex12ParseRecordsEv
_ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
_ZN7mozilla3net9ReadEvent3RunEv
_ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
_ZN7mozilla3net13CacheIOThread10ThreadFuncEv
_ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
_pt_root
start_thread
__clone
- 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE
_ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
_ZN7mozilla3net10CacheIndex12ParseRecordsEv
_ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
_ZN7mozilla3net9ReadEvent3RunEv
_ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
_ZN7mozilla3net13CacheIOThread10ThreadFuncEv
_ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
_pt_root
start_thread
__clone
+ 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
+ 1,79% firefox libz.so.1.2.8 [.] inflate_fast
+ 1,76% Cache2 I/O libxul.so [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
+ 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock
+ 1,37% firefox firefox [.] arena_dalloc
$
These C++ methods that needs unmangling, and I have the firefox-debuginfo
package matching the binary... Have to check this.
Anyway, either you have your binaries (your main binary and the libraries it
uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can
use '-g', that, in its default form is equivalent to '--call-graph fp', or you
will have to ask that the kernel collects stack dumps to then use CFI to do the
callchains, i.e. use '--call-graph dwarf'.
You may want to state how much stack is collected when using --call-graph dwarf,
please look at the 'record' help:
-g enables call-graph recording
--call-graph <mode[,dump_size]>
setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf
Ah, here is another example, this time crossing from kernel to userspace, all the way
back from a spin lock operation to the memory allocation routines used by
firefox, after I pressed -> and zoomed into the kernel DSO:
[acme@zoo linux]$ head -50 perf.hist.1
+ 0,32% firefox [k] clear_page_c_e
+ 0,26% firefox [k] copy_user_enhanced_fast_string
+ 0,26% firefox [k] page_fault
+ 0,20% firefox [k] get_page_from_freelist
+ 0,20% firefox [k] __list_del_entry
+ 0,20% firefox [k] mem_cgroup_page_lruvec
+ 0,16% firefox [k] lookup_page_cgroup_used
- 0,14% firefox [k] _raw_spin_lock
- _raw_spin_lock
- 37,73% try_to_wake_up
- 64,20% wake_up_state
wake_futex
futex_wake
do_futex
sys_futex
system_call
- __lll_unlock_wake
37,39% 0x7f5e81b10400
32,07% 0x7f5e63e81bd0
- 30,54% arena_malloc
malloc
_ZN14nsStringBuffer5AllocEm
_ZN18nsAString_internal10MutatePrepEjPPDsPj
_ZN18nsAString_internal19ReplacePrepInternalEjjjj
+ 35,80% default_wake_function
- 17,55% handle_mm_fault
__do_page_fault
do_page_fault
- page_fault
+ 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv
- 47,12% arena_dalloc
_ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm
_ZN15SnowWhiteKillerD2Ev
_ZN16nsCycleCollector13FreeSnowWhiteEb
_ZN16nsCycleCollector8ShutdownEv
_Z25nsCycleCollector_shutdownv
_ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager
_ZN18ScopedXPCOMStartupD2Ev
_ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData
XRE_main
_ZL7do_mainiPPcP7nsIFile
main
__libc_start_main
_start
+ 9,54% do_futex
+ 9,36% do_cow_fault
+ 9,33% tick_do_update_jiffies64
+ 8,72% futex_wait_setup
+ 7,62% unix_stream_recvmsg
+ 0,14% firefox [k] mem_cgroup_begin_page_stat
[acme@zoo linux]$
- Arnaldo
next prev parent reply other threads:[~2015-01-21 12:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
2015-01-20 18:40 ` Rick Jones
2015-01-21 9:42 ` AKel
2015-01-21 12:40 ` Arnaldo Carvalho de Melo [this message]
2015-01-21 14:56 ` Arnaldo Carvalho de Melo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150121124021.GB3420@kernel.org \
--to=acme@kernel.org \
--cc=ankelly006@hotmail.com \
--cc=linux-perf-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).