linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: AKel <ankelly006@hotmail.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Using perf to generate a call stack from a kernel function to the user space caller
Date: Wed, 21 Jan 2015 09:40:21 -0300	[thread overview]
Message-ID: <20150121124021.GB3420@kernel.org> (raw)
In-Reply-To: <loom.20150121T090925-23@post.gmane.org>

Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu:
> Thanks Rick for your reply. Unfortunately I'm in the big bad corporate 
> world where a simple minion such as myself is not granted sudo access. 

That should not be a problem in most of the cases for profiling your own
apps.

> I've tried passing the callgraph (-g) flag to perf, however I can only 
> recover the call stack within the kernel, it begins at what I suspect is 

Because it will try, by default, to collect %bp (frame pointer)
callchains, which unfortunately is not usually enabled for user space
programs, that use the %bp register for other purposes.

> the kernel entry point. Ideally I could generate a complete call stack.

You can try using the DWARF CFI (Call Frame Information), by using
instead:

  $ perf record --call-graph dwarf firefox

Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
callgraphs:

  $ perf report --no-children # expand callgraphs, press P, press q
  $ head -45 perf.hist.0  
  -    3,47%  Image Scaler     libxul.so                      [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
       _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
       _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
       _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
       _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv
       _ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE
       _ZN7mozilla5image11ScaleRunner3RunEv
       _ZN8nsThread16ProcessNextEventEbPb
       _Z19NS_ProcessNextEventP9nsIThreadb
       _ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE
       _ZN11MessageLoop3RunEv
       _ZN8nsThread10ThreadFuncEPv
       _pt_root
       start_thread
       __clone
  -    3,00%  Cache2 I/O       libc-2.18.so                   [.] __memmove_ssse3_back
     - __memmove_ssse3_back
        - 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_
           - 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE
                _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
                _ZN7mozilla3net10CacheIndex12ParseRecordsEv
                _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
                _ZN7mozilla3net9ReadEvent3RunEv
                _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEv
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
                _pt_root
                start_thread
                __clone
           - 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE
                _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
                _ZN7mozilla3net10CacheIndex12ParseRecordsEv
                _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
                _ZN7mozilla3net9ReadEvent3RunEv
                _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEv
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
                _pt_root
                start_thread
                __clone
        + 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
  +    1,79%  firefox          libz.so.1.2.8                  [.] inflate_fast
  +    1,76%  Cache2 I/O       libxul.so                      [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
  +    1,39%  firefox          libpthread-2.18.so             [.] pthread_mutex_lock
  +    1,37%  firefox          firefox                        [.] arena_dalloc
  $ 


These C++ methods that needs unmangling, and I have the firefox-debuginfo
package matching the binary... Have to check this.

Anyway, either you have your binaries (your main binary and the libraries it
uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can
use '-g', that, in its default form is equivalent to '--call-graph fp', or you
will have to ask that the kernel collects stack dumps to then use CFI to do the
callchains, i.e. use '--call-graph dwarf'.

You may want to state how much stack is collected when using --call-graph dwarf,
please look at the 'record' help:

  -g                    enables call-graph recording
      --call-graph <mode[,dump_size]>
                        setup and enables call-graph (stack chain/backtrace)
                        recording: fp dwarf

Ah, here is another example, this time crossing from kernel to userspace, all the way
back from a spin lock operation to the memory allocation routines used by
firefox, after I pressed -> and zoomed into the kernel DSO:

[acme@zoo linux]$ head -50 perf.hist.1 
+    0,32%  firefox          [k] clear_page_c_e
+    0,26%  firefox          [k] copy_user_enhanced_fast_string
+    0,26%  firefox          [k] page_fault
+    0,20%  firefox          [k] get_page_from_freelist
+    0,20%  firefox          [k] __list_del_entry
+    0,20%  firefox          [k] mem_cgroup_page_lruvec
+    0,16%  firefox          [k] lookup_page_cgroup_used
-    0,14%  firefox          [k] _raw_spin_lock
   - _raw_spin_lock
      - 37,73% try_to_wake_up
         - 64,20% wake_up_state
              wake_futex
              futex_wake
              do_futex
              sys_futex
              system_call
            - __lll_unlock_wake
                 37,39% 0x7f5e81b10400
                 32,07% 0x7f5e63e81bd0
               - 30,54% arena_malloc
                    malloc
                    _ZN14nsStringBuffer5AllocEm
                    _ZN18nsAString_internal10MutatePrepEjPPDsPj
                    _ZN18nsAString_internal19ReplacePrepInternalEjjjj
         + 35,80% default_wake_function
      - 17,55% handle_mm_fault
           __do_page_fault
           do_page_fault
         - page_fault
            + 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv
            - 47,12% arena_dalloc
                 _ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm
                 _ZN15SnowWhiteKillerD2Ev
                 _ZN16nsCycleCollector13FreeSnowWhiteEb
                 _ZN16nsCycleCollector8ShutdownEv
                 _Z25nsCycleCollector_shutdownv
                 _ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager
                 _ZN18ScopedXPCOMStartupD2Ev
                 _ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData
                 XRE_main
                 _ZL7do_mainiPPcP7nsIFile
                 main
                 __libc_start_main
                 _start
      + 9,54% do_futex
      + 9,36% do_cow_fault
      + 9,33% tick_do_update_jiffies64
      + 8,72% futex_wait_setup
      + 7,62% unix_stream_recvmsg
+    0,14%  firefox          [k] mem_cgroup_begin_page_stat
[acme@zoo linux]$

- Arnaldo

  reply	other threads:[~2015-01-21 12:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
2015-01-20 18:40 ` Rick Jones
2015-01-21  9:42   ` AKel
2015-01-21 12:40     ` Arnaldo Carvalho de Melo [this message]
2015-01-21 14:56       ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150121124021.GB3420@kernel.org \
    --to=acme@kernel.org \
    --cc=ankelly006@hotmail.com \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).