* Using perf to generate a call stack from a kernel function to the user space caller
@ 2015-01-20 17:08 AKel
2015-01-20 18:40 ` Rick Jones
0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-20 17:08 UTC (permalink / raw)
To: linux-perf-users
Hi,
I'm trying to profile an application which is spending quite a bit of time
in the kernel (~40% in do_raw_spin_lock for example). To fully understand
what is happening I would like to generate a call stack using perf (or
otherwise) to see where the calls to this function are originating from.
Is this possible? My hands are slightly tied in that I don't have sudo
access to the system I'm working on, so a solution which accounts for this
would be greatly appreciated. Even transferring files to/from the system
is very difficult so I'm rather restricted in what i can do.
Thanks,
A
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
@ 2015-01-20 18:40 ` Rick Jones
2015-01-21 9:42 ` AKel
0 siblings, 1 reply; 5+ messages in thread
From: Rick Jones @ 2015-01-20 18:40 UTC (permalink / raw)
To: AKel, linux-perf-users
On 01/20/2015 09:08 AM, AKel wrote:
> Hi,
>
> I'm trying to profile an application which is spending quite a bit of time
> in the kernel (~40% in do_raw_spin_lock for example). To fully understand
> what is happening I would like to generate a call stack using perf (or
> otherwise) to see where the calls to this function are originating from.
> Is this possible? My hands are slightly tied in that I don't have sudo
> access to the system I'm working on, so a solution which accounts for this
> would be greatly appreciated. Even transferring files to/from the system
> is very difficult so I'm rather restricted in what i can do.
I suspect you will have to resolve the sudo issue by getting sudo.
Ignoring that for the time-being, it sounds like you could use the
call-graph functionality of perf record/report/top?
rick jones
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-20 18:40 ` Rick Jones
@ 2015-01-21 9:42 ` AKel
2015-01-21 12:40 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-21 9:42 UTC (permalink / raw)
To: linux-perf-users
Thanks Rick for your reply. Unfortunately I'm in the big bad corporate
world where a simple minion such as myself is not granted sudo access.
I've tried passing the callgraph (-g) flag to perf, however I can only
recover the call stack within the kernel, it begins at what I suspect is
the kernel entry point. Ideally I could generate a complete call stack.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-21 9:42 ` AKel
@ 2015-01-21 12:40 ` Arnaldo Carvalho de Melo
2015-01-21 14:56 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 12:40 UTC (permalink / raw)
To: AKel; +Cc: linux-perf-users
Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu:
> Thanks Rick for your reply. Unfortunately I'm in the big bad corporate
> world where a simple minion such as myself is not granted sudo access.
That should not be a problem in most of the cases for profiling your own
apps.
> I've tried passing the callgraph (-g) flag to perf, however I can only
> recover the call stack within the kernel, it begins at what I suspect is
Because it will try, by default, to collect %bp (frame pointer)
callchains, which unfortunately is not usually enabled for user space
programs, that use the %bp register for other purposes.
> the kernel entry point. Ideally I could generate a complete call stack.
You can try using the DWARF CFI (Call Frame Information), by using
instead:
$ perf record --call-graph dwarf firefox
Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
callgraphs:
$ perf report --no-children # expand callgraphs, press P, press q
$ head -45 perf.hist.0
- 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
_ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
_ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
_ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
_ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv
_ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE
_ZN7mozilla5image11ScaleRunner3RunEv
_ZN8nsThread16ProcessNextEventEbPb
_Z19NS_ProcessNextEventP9nsIThreadb
_ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE
_ZN11MessageLoop3RunEv
_ZN8nsThread10ThreadFuncEPv
_pt_root
start_thread
__clone
- 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back
- __memmove_ssse3_back
- 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_
- 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE
_ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
_ZN7mozilla3net10CacheIndex12ParseRecordsEv
_ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
_ZN7mozilla3net9ReadEvent3RunEv
_ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
_ZN7mozilla3net13CacheIOThread10ThreadFuncEv
_ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
_pt_root
start_thread
__clone
- 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE
_ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
_ZN7mozilla3net10CacheIndex12ParseRecordsEv
_ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
_ZN7mozilla3net9ReadEvent3RunEv
_ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
_ZN7mozilla3net13CacheIOThread10ThreadFuncEv
_ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
_pt_root
start_thread
__clone
+ 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
+ 1,79% firefox libz.so.1.2.8 [.] inflate_fast
+ 1,76% Cache2 I/O libxul.so [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
+ 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock
+ 1,37% firefox firefox [.] arena_dalloc
$
These C++ methods that needs unmangling, and I have the firefox-debuginfo
package matching the binary... Have to check this.
Anyway, either you have your binaries (your main binary and the libraries it
uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can
use '-g', that, in its default form is equivalent to '--call-graph fp', or you
will have to ask that the kernel collects stack dumps to then use CFI to do the
callchains, i.e. use '--call-graph dwarf'.
You may want to state how much stack is collected when using --call-graph dwarf,
please look at the 'record' help:
-g enables call-graph recording
--call-graph <mode[,dump_size]>
setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf
Ah, here is another example, this time crossing from kernel to userspace, all the way
back from a spin lock operation to the memory allocation routines used by
firefox, after I pressed -> and zoomed into the kernel DSO:
[acme@zoo linux]$ head -50 perf.hist.1
+ 0,32% firefox [k] clear_page_c_e
+ 0,26% firefox [k] copy_user_enhanced_fast_string
+ 0,26% firefox [k] page_fault
+ 0,20% firefox [k] get_page_from_freelist
+ 0,20% firefox [k] __list_del_entry
+ 0,20% firefox [k] mem_cgroup_page_lruvec
+ 0,16% firefox [k] lookup_page_cgroup_used
- 0,14% firefox [k] _raw_spin_lock
- _raw_spin_lock
- 37,73% try_to_wake_up
- 64,20% wake_up_state
wake_futex
futex_wake
do_futex
sys_futex
system_call
- __lll_unlock_wake
37,39% 0x7f5e81b10400
32,07% 0x7f5e63e81bd0
- 30,54% arena_malloc
malloc
_ZN14nsStringBuffer5AllocEm
_ZN18nsAString_internal10MutatePrepEjPPDsPj
_ZN18nsAString_internal19ReplacePrepInternalEjjjj
+ 35,80% default_wake_function
- 17,55% handle_mm_fault
__do_page_fault
do_page_fault
- page_fault
+ 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv
- 47,12% arena_dalloc
_ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm
_ZN15SnowWhiteKillerD2Ev
_ZN16nsCycleCollector13FreeSnowWhiteEb
_ZN16nsCycleCollector8ShutdownEv
_Z25nsCycleCollector_shutdownv
_ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager
_ZN18ScopedXPCOMStartupD2Ev
_ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData
XRE_main
_ZL7do_mainiPPcP7nsIFile
main
__libc_start_main
_start
+ 9,54% do_futex
+ 9,36% do_cow_fault
+ 9,33% tick_do_update_jiffies64
+ 8,72% futex_wait_setup
+ 7,62% unix_stream_recvmsg
+ 0,14% firefox [k] mem_cgroup_begin_page_stat
[acme@zoo linux]$
- Arnaldo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-21 12:40 ` Arnaldo Carvalho de Melo
@ 2015-01-21 14:56 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 14:56 UTC (permalink / raw)
To: AKel; +Cc: linux-perf-users
Em Wed, Jan 21, 2015 at 09:40:21AM -0300, Arnaldo Carvalho de Melo escreveu:
> Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
> callgraphs:
> $ perf report --no-children # expand callgraphs, press P, press q
> $ head -45 perf.hist.0
> - 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
> _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
> _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
> _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
<SNIP>
> These C++ methods that needs unmangling, and I have the firefox-debuginfo
> package matching the binary... Have to check this.
[acme@zoo linux]$ make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory `/home/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
config/Makefile:597: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling
Auto-detecting system features:
... dwarf: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ]
... libbfd: [ OFF ]
after installing binutils-devel it gets symbol demangling:
- 3,47% Image Scaler libxul.so [.] skia::BGRAConvolve2D
skia::BGRAConvolve2D
skia::ImageOperations::ResizeBasic
skia::ImageOperations::Resize
skia::ImageOperations::Resize
mozilla::gfx::Scale
mozilla::image::ScaleRunner::Run
nsThread::ProcessNextEvent
NS_ProcessNextEvent
mozilla::ipc::MessagePumpForNonMainThreads::Run
MessageLoop::Run
nsThread::ThreadFunc
_pt_root
start_thread
__clone
+ 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back
+ 1,79% firefox libz.so.1.2.8 [.] inflate_fast
+ 1,76% Cache2 I/O libxul.so [.] nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::RemoveElement<mozilla::net::CacheIndexRecord*, nsDefaultComparator<mozilla::net::CacheIndexRecord*, mozilla::net::CacheIndexRecord*> >
+ 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock
+ 1,37% firefox firefox [.] arena_dalloc
+ 1,27% firefox libpthread-2.18.so [.] pthread_mutex_unlock
+ 1,02% firefox firefox [.] arena_malloc
+ 0,68% firefox libxul.so [.] PLDHashTable::SearchTable
+ 0,58% firefox libxul.so [.] js::AtomizeChars<unsigned char>
+ 0,55% firefox libxul.so [.] Interpret
+ 0,52% firefox libxul.so [.] js::ObjectImpl::nativeLookup
+ 0,49% firefox libc-2.18.so [.] __memcpy_sse2_unaligned
+ 0,47% firefox libxul.so [.] NewObject
+ 0,45% firefox p11-kit-trust.so [.] lookup_or_create_bucket
+ 0,41% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_lock
+ 0,39% firefox libxul.so [.] js::EmptyShape::getInitialShape
+ 0,38% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_unlock
+ 0,35% ImageDecoder #2 libz.so.1.2.8 [.] inflate_fast
+ 0,35% firefox libxul.so [.] mozilla::gfx::AlphaBoxBlur::BoxBlur_SSE2
+ 0,34% firefox ld-2.18.so [.] do_lookup_x
+ 0,34% firefox libxul.so [.] JSObject::putProperty<(js::ExecutionMode)0>
+ 0,34% firefox libxul.so [.] js::ShapeTable::search
+ 0,33% firefox libxul.so [.] js::baseops::GetProperty
- 0,32% firefox [kernel.vmlinux] [k] clear_page_c_e
clear_page_c_e
- __alloc_pages_nodemask
- 97,50% alloc_pages_vma
- 95,75% handle_mm_fault
__do_page_fault
do_page_fault
- page_fault
- 19,82% __memcpy_sse2_unaligned
- 35,58% _XRead
_XReadPad
XGetImage
_get_image_surface
_cairo_xlib_surface_acquire_source_image
_cairo_surface_acquire_source_image
_pixman_image_for_pattern
_clip_and_composite_boxes
_cairo_image_surface_paint
_cairo_surface_paint
_cairo_gstate_paint
INT__moz_cairo_paint
_moz_cairo_paint_with_alpha
mozilla::gfx::DrawTargetCairo::DrawSurface
mozilla::image::imgFrame::LockImageData
mozilla::image::RasterImage::RequestScale
mozilla::image::RasterImage::OptimalImageSizeForDest
DrawImageInternal
nsLayoutUtils::DrawBackgroundImage
nsImageRenderer::Draw
nsImageRenderer::DrawBackground
nsCSSRendering::PaintBackgroundWithSC
nsCSSRendering::PaintBackground
nsDisplayBackgroundImage::PaintInternal
mozilla::FrameLayerBuilder::PaintItems
mozilla::FrameLayerBuilder::DrawThebesLayer
mozilla::layers::BasicThebesLayer::PaintBuffer
mozilla::layers::BasicThebesLayer::Validate
mozilla::layers::BasicContainerLayer::Validate
mozilla::layers::BasicContainerLayer::Validate
mozilla::layers::BasicLayerManager::EndTransactionInternal
nsDisplayList::PaintForFrame
- 22,39% js::jit::Assembler::executableCopy
js::jit::JitCode::copyFrom
js::jit::Linker::newCode<(js::AllowGC)1>
js::jit::ICStubCompiler::getStubCode
js::jit::ICCallScriptedCompiler::getStub
js::jit::TryAttachCallStub
js::jit::DoCallFallback
0x7f5e847ffdb0
- 21,73% js::NewObjectWithClassProtoCommon
js::jit::NewInitObject
js::jit::DoNewObject
0x7f5e84800ec4
+ 20,30% updatewindow
+ 15,70% arena_dalloc
+ 12,82% arena_malloc
+ 12,73% js::gc::Chunk::fetchNextDecommittedArena
+ 7,31% skb_copy_datagram_iovec
+ 7,04% sse2_blt
+ 4,45% NS_CopySegmentToBuffer
+ 4,18% js::InflateString
+ 4,18% mozilla::VectorBase<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy, js::Vector<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy> >::growStorageBy
+ 4,13% mozilla::image::imgFrame::Init
+ 3,86% _cairo_image_surface_span
+ 3,78% sse2_fill
+ 4,25% do_wp_page
+ 2,50% alloc_pages_current
+ 0,32% ImageDecoder #3 libz.so.1.2.8 [.] inflate_fast
And if you press 'V' you get the DSO name right after the symbol name:
+ 0,32% firefox [k] clear_page_c_e
+ 0,26% firefox [k] copy_user_enhanced_fast_string
+ 0,26% firefox [k] page_fault
+ 0,20% firefox [k] get_page_from_freelist
+ 0,20% firefox [k] __list_del_entry
+ 0,20% firefox [k] mem_cgroup_page_lruvec
+ 0,16% firefox [k] lookup_page_cgroup_used
- 0,14% firefox [k] _raw_spin_lock
- _raw_spin_lock [kernel.vmlinux]
- 37,73% try_to_wake_up [kernel.vmlinux]
- 64,20% wake_up_state [kernel.vmlinux]
wake_futex [kernel.vmlinux]
futex_wake [kernel.vmlinux]
do_futex [kernel.vmlinux]
sys_futex [kernel.vmlinux]
system_call [kernel.vmlinux]
- __lll_unlock_wake libpthread-2.18.so
37,39% 0x7f5e81b10400 libxul.so
32,07% 0x7f5e63e81bd0 unknown
- 30,54% arena_malloc firefox
malloc firefox
nsStringBuffer::Alloc libxul.so
nsAString_internal::MutatePrep libxul.so
nsAString_internal::ReplacePrepInternal libxul.so
+ 35,80% default_wake_function [kernel.vmlinux]
- 17,55% handle_mm_fault [kernel.vmlinux]
__do_page_fault [kernel.vmlinux]
do_page_fault [kernel.vmlinux]
- page_fault [kernel.vmlinux]
- 52,88% js::gc::Chunk::fetchNextDecommittedArena libxul.so
js::gc::Chunk::allocateArena libxul.so
js::gc::ArenaLists::allocateFromArenaInline libxul.so
js::gc::ArenaLists::refillFreeList<(js::AllowGC)1> libxul.so
js::PropertyTree::getChild libxul.so
JSObject::getChildProperty libxul.so
JSObject::addPropertyInternal<(js::ExecutionMode)0> libxul.so
JSObject::putProperty<(js::ExecutionMode)0> libxul.so
DefinePropertyOrElement<(js::ExecutionMode)0> libxul.so
js::DefineNativeProperty libxul.so
JSObject::defineGeneric libxul.so
js::DefineFunction libxul.so
JS_DefineFunctions libxul.so
js::GlobalObject::resolveConstructor libxul.so
js::GetBuiltinPrototype libxul.so
js::NewDenseUnallocatedArray libxul.so
js::XDRObjectLiteral<(js::XDRMode)1> libxul.so
js::XDRScript<(js::XDRMode)1> libxul.so
js::XDRState<(js::XDRMode)1>::codeScript libxul.so
JS_DecodeScript libxul.so
ReadCachedScript libxul.so
mozJSComponentLoader::ObjectForLocation libxul.so
mozJSComponentLoader::ImportInto libxul.so
mozJSComponentLoader::Import libxul.so
nsXPCComponents_Utils::Import libxul.so
NS_InvokeByIndex libxul.so
XPCWrappedNative::CallMethod libxul.so
XPC_WN_CallMethod libxul.so
js::Invoke libxul.so
- 47,12% arena_dalloc firefox
nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::ShiftData libxul.so
SnowWhiteKiller::~SnowWhiteKiller libxul.so
nsCycleCollector::FreeSnowWhite libxul.so
nsCycleCollector::Shutdown libxul.so
nsCycleCollector_shutdown libxul.so
mozilla::ShutdownXPCOM libxul.so
ScopedXPCOMStartup::~ScopedXPCOMStartup libxul.so
XREMain::XRE_main libxul.so
XRE_main libxul.so
do_main firefox
main firefox
__libc_start_main libc-2.18.so
_start firefox
+ 9,54% do_futex [kernel.vmlinux]
- Arnaldo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-21 14:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
2015-01-20 18:40 ` Rick Jones
2015-01-21 9:42 ` AKel
2015-01-21 12:40 ` Arnaldo Carvalho de Melo
2015-01-21 14:56 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).