linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using perf to generate a call stack from a kernel function to the user space caller
@ 2015-01-20 17:08 AKel
  2015-01-20 18:40 ` Rick Jones
  0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-20 17:08 UTC (permalink / raw)
  To: linux-perf-users

Hi,

I'm trying to profile an application which is spending quite a bit of time 
in the kernel (~40% in do_raw_spin_lock for example). To fully understand 
what is happening I would like to generate a call stack using perf (or 
otherwise) to see where the calls to this function are originating from. 
Is this possible? My hands are slightly tied in that I don't have sudo 
access to the system I'm working on, so a solution which accounts for this 
would be greatly appreciated. Even transferring files to/from the system 
is very difficult so I'm rather restricted in what i can do.

Thanks,
A

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
@ 2015-01-20 18:40 ` Rick Jones
  2015-01-21  9:42   ` AKel
  0 siblings, 1 reply; 5+ messages in thread
From: Rick Jones @ 2015-01-20 18:40 UTC (permalink / raw)
  To: AKel, linux-perf-users

On 01/20/2015 09:08 AM, AKel wrote:
> Hi,
>
> I'm trying to profile an application which is spending quite a bit of time
> in the kernel (~40% in do_raw_spin_lock for example). To fully understand
> what is happening I would like to generate a call stack using perf (or
> otherwise) to see where the calls to this function are originating from.
> Is this possible? My hands are slightly tied in that I don't have sudo
> access to the system I'm working on, so a solution which accounts for this
> would be greatly appreciated. Even transferring files to/from the system
> is very difficult so I'm rather restricted in what i can do.

I suspect you will have to resolve the sudo issue by getting sudo. 
Ignoring that for the time-being, it sounds like you could use the 
call-graph functionality of perf record/report/top?

rick jones

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-20 18:40 ` Rick Jones
@ 2015-01-21  9:42   ` AKel
  2015-01-21 12:40     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-21  9:42 UTC (permalink / raw)
  To: linux-perf-users


Thanks Rick for your reply. Unfortunately I'm in the big bad corporate 
world where a simple minion such as myself is not granted sudo access. 
I've tried passing the callgraph (-g) flag to perf, however I can only 
recover the call stack within the kernel, it begins at what I suspect is 
the kernel entry point. Ideally I could generate a complete call stack.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-21  9:42   ` AKel
@ 2015-01-21 12:40     ` Arnaldo Carvalho de Melo
  2015-01-21 14:56       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 12:40 UTC (permalink / raw)
  To: AKel; +Cc: linux-perf-users

Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu:
> Thanks Rick for your reply. Unfortunately I'm in the big bad corporate 
> world where a simple minion such as myself is not granted sudo access. 

That should not be a problem in most of the cases for profiling your own
apps.

> I've tried passing the callgraph (-g) flag to perf, however I can only 
> recover the call stack within the kernel, it begins at what I suspect is 

Because it will try, by default, to collect %bp (frame pointer)
callchains, which unfortunately is not usually enabled for user space
programs, that use the %bp register for other purposes.

> the kernel entry point. Ideally I could generate a complete call stack.

You can try using the DWARF CFI (Call Frame Information), by using
instead:

  $ perf record --call-graph dwarf firefox

Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
callgraphs:

  $ perf report --no-children # expand callgraphs, press P, press q
  $ head -45 perf.hist.0  
  -    3,47%  Image Scaler     libxul.so                      [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
       _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
       _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
       _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
       _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv
       _ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE
       _ZN7mozilla5image11ScaleRunner3RunEv
       _ZN8nsThread16ProcessNextEventEbPb
       _Z19NS_ProcessNextEventP9nsIThreadb
       _ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE
       _ZN11MessageLoop3RunEv
       _ZN8nsThread10ThreadFuncEPv
       _pt_root
       start_thread
       __clone
  -    3,00%  Cache2 I/O       libc-2.18.so                   [.] __memmove_ssse3_back
     - __memmove_ssse3_back
        - 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_
           - 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE
                _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
                _ZN7mozilla3net10CacheIndex12ParseRecordsEv
                _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
                _ZN7mozilla3net9ReadEvent3RunEv
                _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEv
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
                _pt_root
                start_thread
                __clone
           - 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE
                _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
                _ZN7mozilla3net10CacheIndex12ParseRecordsEv
                _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
                _ZN7mozilla3net9ReadEvent3RunEv
                _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEv
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
                _pt_root
                start_thread
                __clone
        + 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
  +    1,79%  firefox          libz.so.1.2.8                  [.] inflate_fast
  +    1,76%  Cache2 I/O       libxul.so                      [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
  +    1,39%  firefox          libpthread-2.18.so             [.] pthread_mutex_lock
  +    1,37%  firefox          firefox                        [.] arena_dalloc
  $ 


These C++ methods that needs unmangling, and I have the firefox-debuginfo
package matching the binary... Have to check this.

Anyway, either you have your binaries (your main binary and the libraries it
uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can
use '-g', that, in its default form is equivalent to '--call-graph fp', or you
will have to ask that the kernel collects stack dumps to then use CFI to do the
callchains, i.e. use '--call-graph dwarf'.

You may want to state how much stack is collected when using --call-graph dwarf,
please look at the 'record' help:

  -g                    enables call-graph recording
      --call-graph <mode[,dump_size]>
                        setup and enables call-graph (stack chain/backtrace)
                        recording: fp dwarf

Ah, here is another example, this time crossing from kernel to userspace, all the way
back from a spin lock operation to the memory allocation routines used by
firefox, after I pressed -> and zoomed into the kernel DSO:

[acme@zoo linux]$ head -50 perf.hist.1 
+    0,32%  firefox          [k] clear_page_c_e
+    0,26%  firefox          [k] copy_user_enhanced_fast_string
+    0,26%  firefox          [k] page_fault
+    0,20%  firefox          [k] get_page_from_freelist
+    0,20%  firefox          [k] __list_del_entry
+    0,20%  firefox          [k] mem_cgroup_page_lruvec
+    0,16%  firefox          [k] lookup_page_cgroup_used
-    0,14%  firefox          [k] _raw_spin_lock
   - _raw_spin_lock
      - 37,73% try_to_wake_up
         - 64,20% wake_up_state
              wake_futex
              futex_wake
              do_futex
              sys_futex
              system_call
            - __lll_unlock_wake
                 37,39% 0x7f5e81b10400
                 32,07% 0x7f5e63e81bd0
               - 30,54% arena_malloc
                    malloc
                    _ZN14nsStringBuffer5AllocEm
                    _ZN18nsAString_internal10MutatePrepEjPPDsPj
                    _ZN18nsAString_internal19ReplacePrepInternalEjjjj
         + 35,80% default_wake_function
      - 17,55% handle_mm_fault
           __do_page_fault
           do_page_fault
         - page_fault
            + 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv
            - 47,12% arena_dalloc
                 _ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm
                 _ZN15SnowWhiteKillerD2Ev
                 _ZN16nsCycleCollector13FreeSnowWhiteEb
                 _ZN16nsCycleCollector8ShutdownEv
                 _Z25nsCycleCollector_shutdownv
                 _ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager
                 _ZN18ScopedXPCOMStartupD2Ev
                 _ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData
                 XRE_main
                 _ZL7do_mainiPPcP7nsIFile
                 main
                 __libc_start_main
                 _start
      + 9,54% do_futex
      + 9,36% do_cow_fault
      + 9,33% tick_do_update_jiffies64
      + 8,72% futex_wait_setup
      + 7,62% unix_stream_recvmsg
+    0,14%  firefox          [k] mem_cgroup_begin_page_stat
[acme@zoo linux]$

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-21 12:40     ` Arnaldo Carvalho de Melo
@ 2015-01-21 14:56       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 14:56 UTC (permalink / raw)
  To: AKel; +Cc: linux-perf-users

Em Wed, Jan 21, 2015 at 09:40:21AM -0300, Arnaldo Carvalho de Melo escreveu:
> Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
> callgraphs:
 
>   $ perf report --no-children # expand callgraphs, press P, press q
>   $ head -45 perf.hist.0  
>   -    3,47%  Image Scaler     libxul.so                      [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
>        _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
>        _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
>        _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv

<SNIP>

> These C++ methods that needs unmangling, and I have the firefox-debuginfo
> package matching the binary... Have to check this.

  [acme@zoo linux]$ make -C tools/perf O=/tmp/build/perf install-bin
  make: Entering directory `/home/git/linux/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
  config/Makefile:597: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling

  Auto-detecting system features:
  ...                         dwarf: [ on  ]
  ...                         glibc: [ on  ]
  ...                          gtk2: [ on  ]
  ...                      libaudit: [ on  ]
  ...                        libbfd: [ OFF ]

after installing binutils-devel it gets symbol demangling:
 
-    3,47%  Image Scaler     libxul.so                      [.] skia::BGRAConvolve2D
     skia::BGRAConvolve2D
     skia::ImageOperations::ResizeBasic
     skia::ImageOperations::Resize
     skia::ImageOperations::Resize
     mozilla::gfx::Scale
     mozilla::image::ScaleRunner::Run
     nsThread::ProcessNextEvent
     NS_ProcessNextEvent
     mozilla::ipc::MessagePumpForNonMainThreads::Run
     MessageLoop::Run
     nsThread::ThreadFunc
     _pt_root
     start_thread
     __clone
+    3,00%  Cache2 I/O       libc-2.18.so                   [.] __memmove_ssse3_back
+    1,79%  firefox          libz.so.1.2.8                  [.] inflate_fast
+    1,76%  Cache2 I/O       libxul.so                      [.] nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::RemoveElement<mozilla::net::CacheIndexRecord*, nsDefaultComparator<mozilla::net::CacheIndexRecord*, mozilla::net::CacheIndexRecord*> >
+    1,39%  firefox          libpthread-2.18.so             [.] pthread_mutex_lock
+    1,37%  firefox          firefox                        [.] arena_dalloc
+    1,27%  firefox          libpthread-2.18.so             [.] pthread_mutex_unlock
+    1,02%  firefox          firefox                        [.] arena_malloc
+    0,68%  firefox          libxul.so                      [.] PLDHashTable::SearchTable
+    0,58%  firefox          libxul.so                      [.] js::AtomizeChars<unsigned char>
+    0,55%  firefox          libxul.so                      [.] Interpret
+    0,52%  firefox          libxul.so                      [.] js::ObjectImpl::nativeLookup
+    0,49%  firefox          libc-2.18.so                   [.] __memcpy_sse2_unaligned
+    0,47%  firefox          libxul.so                      [.] NewObject
+    0,45%  firefox          p11-kit-trust.so               [.] lookup_or_create_bucket
+    0,41%  mozStorage #5    libpthread-2.18.so             [.] pthread_mutex_lock
+    0,39%  firefox          libxul.so                      [.] js::EmptyShape::getInitialShape
+    0,38%  mozStorage #5    libpthread-2.18.so             [.] pthread_mutex_unlock
+    0,35%  ImageDecoder #2  libz.so.1.2.8                  [.] inflate_fast
+    0,35%  firefox          libxul.so                      [.] mozilla::gfx::AlphaBoxBlur::BoxBlur_SSE2
+    0,34%  firefox          ld-2.18.so                     [.] do_lookup_x
+    0,34%  firefox          libxul.so                      [.] JSObject::putProperty<(js::ExecutionMode)0>
+    0,34%  firefox          libxul.so                      [.] js::ShapeTable::search
+    0,33%  firefox          libxul.so                      [.] js::baseops::GetProperty
-    0,32%  firefox          [kernel.vmlinux]               [k] clear_page_c_e
     clear_page_c_e
   - __alloc_pages_nodemask
      - 97,50% alloc_pages_vma
         - 95,75% handle_mm_fault
              __do_page_fault
              do_page_fault
            - page_fault
               - 19,82% __memcpy_sse2_unaligned
                  - 35,58% _XRead
                       _XReadPad
                       XGetImage
                       _get_image_surface
                       _cairo_xlib_surface_acquire_source_image
                       _cairo_surface_acquire_source_image
                       _pixman_image_for_pattern
                       _clip_and_composite_boxes
                       _cairo_image_surface_paint
                       _cairo_surface_paint
                       _cairo_gstate_paint
                       INT__moz_cairo_paint
                       _moz_cairo_paint_with_alpha
                       mozilla::gfx::DrawTargetCairo::DrawSurface
                       mozilla::image::imgFrame::LockImageData
                       mozilla::image::RasterImage::RequestScale
                       mozilla::image::RasterImage::OptimalImageSizeForDest
                       DrawImageInternal
                       nsLayoutUtils::DrawBackgroundImage
                       nsImageRenderer::Draw
                       nsImageRenderer::DrawBackground
                       nsCSSRendering::PaintBackgroundWithSC
                       nsCSSRendering::PaintBackground
                       nsDisplayBackgroundImage::PaintInternal
                       mozilla::FrameLayerBuilder::PaintItems
                       mozilla::FrameLayerBuilder::DrawThebesLayer
                       mozilla::layers::BasicThebesLayer::PaintBuffer
                       mozilla::layers::BasicThebesLayer::Validate
                       mozilla::layers::BasicContainerLayer::Validate
                       mozilla::layers::BasicContainerLayer::Validate
                       mozilla::layers::BasicLayerManager::EndTransactionInternal
                       nsDisplayList::PaintForFrame
                  - 22,39% js::jit::Assembler::executableCopy
                       js::jit::JitCode::copyFrom
                       js::jit::Linker::newCode<(js::AllowGC)1>
                       js::jit::ICStubCompiler::getStubCode
                       js::jit::ICCallScriptedCompiler::getStub
                       js::jit::TryAttachCallStub
                       js::jit::DoCallFallback
                       0x7f5e847ffdb0
                  - 21,73% js::NewObjectWithClassProtoCommon
                       js::jit::NewInitObject
                       js::jit::DoNewObject
                       0x7f5e84800ec4
                  + 20,30% updatewindow
               + 15,70% arena_dalloc
               + 12,82% arena_malloc
               + 12,73% js::gc::Chunk::fetchNextDecommittedArena
               + 7,31% skb_copy_datagram_iovec
               + 7,04% sse2_blt
               + 4,45% NS_CopySegmentToBuffer
               + 4,18% js::InflateString
               + 4,18% mozilla::VectorBase<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy, js::Vector<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy> >::growStorageBy
               + 4,13% mozilla::image::imgFrame::Init
               + 3,86% _cairo_image_surface_span
               + 3,78% sse2_fill
         + 4,25% do_wp_page
      + 2,50% alloc_pages_current
+    0,32%  ImageDecoder #3  libz.so.1.2.8                  [.] inflate_fast


And if you press 'V' you get the DSO name right after the symbol name:


+    0,32%  firefox          [k] clear_page_c_e
+    0,26%  firefox          [k] copy_user_enhanced_fast_string
+    0,26%  firefox          [k] page_fault
+    0,20%  firefox          [k] get_page_from_freelist
+    0,20%  firefox          [k] __list_del_entry
+    0,20%  firefox          [k] mem_cgroup_page_lruvec
+    0,16%  firefox          [k] lookup_page_cgroup_used
-    0,14%  firefox          [k] _raw_spin_lock
   - _raw_spin_lock [kernel.vmlinux]
      - 37,73% try_to_wake_up [kernel.vmlinux]
         - 64,20% wake_up_state [kernel.vmlinux]
              wake_futex [kernel.vmlinux]
              futex_wake [kernel.vmlinux]
              do_futex [kernel.vmlinux]
              sys_futex [kernel.vmlinux]
              system_call [kernel.vmlinux]
            - __lll_unlock_wake libpthread-2.18.so
                 37,39% 0x7f5e81b10400 libxul.so
                 32,07% 0x7f5e63e81bd0 unknown
               - 30,54% arena_malloc firefox
                    malloc firefox
                    nsStringBuffer::Alloc libxul.so
                    nsAString_internal::MutatePrep libxul.so
                    nsAString_internal::ReplacePrepInternal libxul.so
         + 35,80% default_wake_function [kernel.vmlinux]
      - 17,55% handle_mm_fault [kernel.vmlinux]
           __do_page_fault [kernel.vmlinux]
           do_page_fault [kernel.vmlinux]
         - page_fault [kernel.vmlinux]
            - 52,88% js::gc::Chunk::fetchNextDecommittedArena libxul.so
                 js::gc::Chunk::allocateArena libxul.so
                 js::gc::ArenaLists::allocateFromArenaInline libxul.so
                 js::gc::ArenaLists::refillFreeList<(js::AllowGC)1> libxul.so
                 js::PropertyTree::getChild libxul.so
                 JSObject::getChildProperty libxul.so
                 JSObject::addPropertyInternal<(js::ExecutionMode)0> libxul.so
                 JSObject::putProperty<(js::ExecutionMode)0> libxul.so
                 DefinePropertyOrElement<(js::ExecutionMode)0> libxul.so
                 js::DefineNativeProperty libxul.so
                 JSObject::defineGeneric libxul.so
                 js::DefineFunction libxul.so
                 JS_DefineFunctions libxul.so
                 js::GlobalObject::resolveConstructor libxul.so
                 js::GetBuiltinPrototype libxul.so
                 js::NewDenseUnallocatedArray libxul.so
                 js::XDRObjectLiteral<(js::XDRMode)1> libxul.so
                 js::XDRScript<(js::XDRMode)1> libxul.so
                 js::XDRState<(js::XDRMode)1>::codeScript libxul.so
                 JS_DecodeScript libxul.so
                 ReadCachedScript libxul.so
                 mozJSComponentLoader::ObjectForLocation libxul.so
                 mozJSComponentLoader::ImportInto libxul.so
                 mozJSComponentLoader::Import libxul.so
                 nsXPCComponents_Utils::Import libxul.so
                 NS_InvokeByIndex libxul.so
                 XPCWrappedNative::CallMethod libxul.so
                 XPC_WN_CallMethod libxul.so
                 js::Invoke libxul.so
            - 47,12% arena_dalloc firefox
                 nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::ShiftData libxul.so
                 SnowWhiteKiller::~SnowWhiteKiller libxul.so
                 nsCycleCollector::FreeSnowWhite libxul.so
                 nsCycleCollector::Shutdown libxul.so
                 nsCycleCollector_shutdown libxul.so
                 mozilla::ShutdownXPCOM libxul.so
                 ScopedXPCOMStartup::~ScopedXPCOMStartup libxul.so
                 XREMain::XRE_main libxul.so
                 XRE_main libxul.so
                 do_main firefox
                 main firefox
                 __libc_start_main libc-2.18.so
                 _start firefox
      + 9,54% do_futex [kernel.vmlinux]

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-21 14:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
2015-01-20 18:40 ` Rick Jones
2015-01-21  9:42   ` AKel
2015-01-21 12:40     ` Arnaldo Carvalho de Melo
2015-01-21 14:56       ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).