* Using perf to generate a call stack from a kernel function to the user space caller @ 2015-01-20 17:08 AKel 2015-01-20 18:40 ` Rick Jones 0 siblings, 1 reply; 5+ messages in thread From: AKel @ 2015-01-20 17:08 UTC (permalink / raw) To: linux-perf-users Hi, I'm trying to profile an application which is spending quite a bit of time in the kernel (~40% in do_raw_spin_lock for example). To fully understand what is happening I would like to generate a call stack using perf (or otherwise) to see where the calls to this function are originating from. Is this possible? My hands are slightly tied in that I don't have sudo access to the system I'm working on, so a solution which accounts for this would be greatly appreciated. Even transferring files to/from the system is very difficult so I'm rather restricted in what i can do. Thanks, A ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel @ 2015-01-20 18:40 ` Rick Jones 2015-01-21 9:42 ` AKel 0 siblings, 1 reply; 5+ messages in thread From: Rick Jones @ 2015-01-20 18:40 UTC (permalink / raw) To: AKel, linux-perf-users On 01/20/2015 09:08 AM, AKel wrote: > Hi, > > I'm trying to profile an application which is spending quite a bit of time > in the kernel (~40% in do_raw_spin_lock for example). To fully understand > what is happening I would like to generate a call stack using perf (or > otherwise) to see where the calls to this function are originating from. > Is this possible? My hands are slightly tied in that I don't have sudo > access to the system I'm working on, so a solution which accounts for this > would be greatly appreciated. Even transferring files to/from the system > is very difficult so I'm rather restricted in what i can do. I suspect you will have to resolve the sudo issue by getting sudo. Ignoring that for the time-being, it sounds like you could use the call-graph functionality of perf record/report/top? rick jones ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-20 18:40 ` Rick Jones @ 2015-01-21 9:42 ` AKel 2015-01-21 12:40 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 5+ messages in thread From: AKel @ 2015-01-21 9:42 UTC (permalink / raw) To: linux-perf-users Thanks Rick for your reply. Unfortunately I'm in the big bad corporate world where a simple minion such as myself is not granted sudo access. I've tried passing the callgraph (-g) flag to perf, however I can only recover the call stack within the kernel, it begins at what I suspect is the kernel entry point. Ideally I could generate a complete call stack. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-21 9:42 ` AKel @ 2015-01-21 12:40 ` Arnaldo Carvalho de Melo 2015-01-21 14:56 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 5+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-01-21 12:40 UTC (permalink / raw) To: AKel; +Cc: linux-perf-users Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu: > Thanks Rick for your reply. Unfortunately I'm in the big bad corporate > world where a simple minion such as myself is not granted sudo access. That should not be a problem in most of the cases for profiling your own apps. > I've tried passing the callgraph (-g) flag to perf, however I can only > recover the call stack within the kernel, it begins at what I suspect is Because it will try, by default, to collect %bp (frame pointer) callchains, which unfortunately is not usually enabled for user space programs, that use the %bp register for other purposes. > the kernel entry point. Ideally I could generate a complete call stack. You can try using the DWARF CFI (Call Frame Information), by using instead: $ perf record --call-graph dwarf firefox Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few callgraphs: $ perf report --no-children # expand callgraphs, press P, press q $ head -45 perf.hist.0 - 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv _ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE _ZN7mozilla5image11ScaleRunner3RunEv _ZN8nsThread16ProcessNextEventEbPb _Z19NS_ProcessNextEventP9nsIThreadb _ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE _ZN11MessageLoop3RunEv _ZN8nsThread10ThreadFuncEPv _pt_root start_thread __clone - 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back - __memmove_ssse3_back - 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_ - 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev _ZN7mozilla3net10CacheIndex12ParseRecordsEv _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult _ZN7mozilla3net9ReadEvent3RunEv _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj _ZN7mozilla3net13CacheIOThread10ThreadFuncEv _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv _pt_root start_thread __clone - 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev _ZN7mozilla3net10CacheIndex12ParseRecordsEv _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult _ZN7mozilla3net9ReadEvent3RunEv _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj _ZN7mozilla3net13CacheIOThread10ThreadFuncEv _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv _pt_root start_thread __clone + 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60 + 1,79% firefox libz.so.1.2.8 [.] inflate_fast + 1,76% Cache2 I/O libxul.so [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60 + 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock + 1,37% firefox firefox [.] arena_dalloc $ These C++ methods that needs unmangling, and I have the firefox-debuginfo package matching the binary... Have to check this. Anyway, either you have your binaries (your main binary and the libraries it uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can use '-g', that, in its default form is equivalent to '--call-graph fp', or you will have to ask that the kernel collects stack dumps to then use CFI to do the callchains, i.e. use '--call-graph dwarf'. You may want to state how much stack is collected when using --call-graph dwarf, please look at the 'record' help: -g enables call-graph recording --call-graph <mode[,dump_size]> setup and enables call-graph (stack chain/backtrace) recording: fp dwarf Ah, here is another example, this time crossing from kernel to userspace, all the way back from a spin lock operation to the memory allocation routines used by firefox, after I pressed -> and zoomed into the kernel DSO: [acme@zoo linux]$ head -50 perf.hist.1 + 0,32% firefox [k] clear_page_c_e + 0,26% firefox [k] copy_user_enhanced_fast_string + 0,26% firefox [k] page_fault + 0,20% firefox [k] get_page_from_freelist + 0,20% firefox [k] __list_del_entry + 0,20% firefox [k] mem_cgroup_page_lruvec + 0,16% firefox [k] lookup_page_cgroup_used - 0,14% firefox [k] _raw_spin_lock - _raw_spin_lock - 37,73% try_to_wake_up - 64,20% wake_up_state wake_futex futex_wake do_futex sys_futex system_call - __lll_unlock_wake 37,39% 0x7f5e81b10400 32,07% 0x7f5e63e81bd0 - 30,54% arena_malloc malloc _ZN14nsStringBuffer5AllocEm _ZN18nsAString_internal10MutatePrepEjPPDsPj _ZN18nsAString_internal19ReplacePrepInternalEjjjj + 35,80% default_wake_function - 17,55% handle_mm_fault __do_page_fault do_page_fault - page_fault + 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv - 47,12% arena_dalloc _ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm _ZN15SnowWhiteKillerD2Ev _ZN16nsCycleCollector13FreeSnowWhiteEb _ZN16nsCycleCollector8ShutdownEv _Z25nsCycleCollector_shutdownv _ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager _ZN18ScopedXPCOMStartupD2Ev _ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData XRE_main _ZL7do_mainiPPcP7nsIFile main __libc_start_main _start + 9,54% do_futex + 9,36% do_cow_fault + 9,33% tick_do_update_jiffies64 + 8,72% futex_wait_setup + 7,62% unix_stream_recvmsg + 0,14% firefox [k] mem_cgroup_begin_page_stat [acme@zoo linux]$ - Arnaldo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-21 12:40 ` Arnaldo Carvalho de Melo @ 2015-01-21 14:56 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 5+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-01-21 14:56 UTC (permalink / raw) To: AKel; +Cc: linux-perf-users Em Wed, Jan 21, 2015 at 09:40:21AM -0300, Arnaldo Carvalho de Melo escreveu: > Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few > callgraphs: > $ perf report --no-children # expand callgraphs, press P, press q > $ head -45 perf.hist.0 > - 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb > _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb > _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv > _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv <SNIP> > These C++ methods that needs unmangling, and I have the firefox-debuginfo > package matching the binary... Have to check this. [acme@zoo linux]$ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory `/home/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build config/Makefile:597: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling Auto-detecting system features: ... dwarf: [ on ] ... glibc: [ on ] ... gtk2: [ on ] ... libaudit: [ on ] ... libbfd: [ OFF ] after installing binutils-devel it gets symbol demangling: - 3,47% Image Scaler libxul.so [.] skia::BGRAConvolve2D skia::BGRAConvolve2D skia::ImageOperations::ResizeBasic skia::ImageOperations::Resize skia::ImageOperations::Resize mozilla::gfx::Scale mozilla::image::ScaleRunner::Run nsThread::ProcessNextEvent NS_ProcessNextEvent mozilla::ipc::MessagePumpForNonMainThreads::Run MessageLoop::Run nsThread::ThreadFunc _pt_root start_thread __clone + 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back + 1,79% firefox libz.so.1.2.8 [.] inflate_fast + 1,76% Cache2 I/O libxul.so [.] nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::RemoveElement<mozilla::net::CacheIndexRecord*, nsDefaultComparator<mozilla::net::CacheIndexRecord*, mozilla::net::CacheIndexRecord*> > + 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock + 1,37% firefox firefox [.] arena_dalloc + 1,27% firefox libpthread-2.18.so [.] pthread_mutex_unlock + 1,02% firefox firefox [.] arena_malloc + 0,68% firefox libxul.so [.] PLDHashTable::SearchTable + 0,58% firefox libxul.so [.] js::AtomizeChars<unsigned char> + 0,55% firefox libxul.so [.] Interpret + 0,52% firefox libxul.so [.] js::ObjectImpl::nativeLookup + 0,49% firefox libc-2.18.so [.] __memcpy_sse2_unaligned + 0,47% firefox libxul.so [.] NewObject + 0,45% firefox p11-kit-trust.so [.] lookup_or_create_bucket + 0,41% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_lock + 0,39% firefox libxul.so [.] js::EmptyShape::getInitialShape + 0,38% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_unlock + 0,35% ImageDecoder #2 libz.so.1.2.8 [.] inflate_fast + 0,35% firefox libxul.so [.] mozilla::gfx::AlphaBoxBlur::BoxBlur_SSE2 + 0,34% firefox ld-2.18.so [.] do_lookup_x + 0,34% firefox libxul.so [.] JSObject::putProperty<(js::ExecutionMode)0> + 0,34% firefox libxul.so [.] js::ShapeTable::search + 0,33% firefox libxul.so [.] js::baseops::GetProperty - 0,32% firefox [kernel.vmlinux] [k] clear_page_c_e clear_page_c_e - __alloc_pages_nodemask - 97,50% alloc_pages_vma - 95,75% handle_mm_fault __do_page_fault do_page_fault - page_fault - 19,82% __memcpy_sse2_unaligned - 35,58% _XRead _XReadPad XGetImage _get_image_surface _cairo_xlib_surface_acquire_source_image _cairo_surface_acquire_source_image _pixman_image_for_pattern _clip_and_composite_boxes _cairo_image_surface_paint _cairo_surface_paint _cairo_gstate_paint INT__moz_cairo_paint _moz_cairo_paint_with_alpha mozilla::gfx::DrawTargetCairo::DrawSurface mozilla::image::imgFrame::LockImageData mozilla::image::RasterImage::RequestScale mozilla::image::RasterImage::OptimalImageSizeForDest DrawImageInternal nsLayoutUtils::DrawBackgroundImage nsImageRenderer::Draw nsImageRenderer::DrawBackground nsCSSRendering::PaintBackgroundWithSC nsCSSRendering::PaintBackground nsDisplayBackgroundImage::PaintInternal mozilla::FrameLayerBuilder::PaintItems mozilla::FrameLayerBuilder::DrawThebesLayer mozilla::layers::BasicThebesLayer::PaintBuffer mozilla::layers::BasicThebesLayer::Validate mozilla::layers::BasicContainerLayer::Validate mozilla::layers::BasicContainerLayer::Validate mozilla::layers::BasicLayerManager::EndTransactionInternal nsDisplayList::PaintForFrame - 22,39% js::jit::Assembler::executableCopy js::jit::JitCode::copyFrom js::jit::Linker::newCode<(js::AllowGC)1> js::jit::ICStubCompiler::getStubCode js::jit::ICCallScriptedCompiler::getStub js::jit::TryAttachCallStub js::jit::DoCallFallback 0x7f5e847ffdb0 - 21,73% js::NewObjectWithClassProtoCommon js::jit::NewInitObject js::jit::DoNewObject 0x7f5e84800ec4 + 20,30% updatewindow + 15,70% arena_dalloc + 12,82% arena_malloc + 12,73% js::gc::Chunk::fetchNextDecommittedArena + 7,31% skb_copy_datagram_iovec + 7,04% sse2_blt + 4,45% NS_CopySegmentToBuffer + 4,18% js::InflateString + 4,18% mozilla::VectorBase<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy, js::Vector<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy> >::growStorageBy + 4,13% mozilla::image::imgFrame::Init + 3,86% _cairo_image_surface_span + 3,78% sse2_fill + 4,25% do_wp_page + 2,50% alloc_pages_current + 0,32% ImageDecoder #3 libz.so.1.2.8 [.] inflate_fast And if you press 'V' you get the DSO name right after the symbol name: + 0,32% firefox [k] clear_page_c_e + 0,26% firefox [k] copy_user_enhanced_fast_string + 0,26% firefox [k] page_fault + 0,20% firefox [k] get_page_from_freelist + 0,20% firefox [k] __list_del_entry + 0,20% firefox [k] mem_cgroup_page_lruvec + 0,16% firefox [k] lookup_page_cgroup_used - 0,14% firefox [k] _raw_spin_lock - _raw_spin_lock [kernel.vmlinux] - 37,73% try_to_wake_up [kernel.vmlinux] - 64,20% wake_up_state [kernel.vmlinux] wake_futex [kernel.vmlinux] futex_wake [kernel.vmlinux] do_futex [kernel.vmlinux] sys_futex [kernel.vmlinux] system_call [kernel.vmlinux] - __lll_unlock_wake libpthread-2.18.so 37,39% 0x7f5e81b10400 libxul.so 32,07% 0x7f5e63e81bd0 unknown - 30,54% arena_malloc firefox malloc firefox nsStringBuffer::Alloc libxul.so nsAString_internal::MutatePrep libxul.so nsAString_internal::ReplacePrepInternal libxul.so + 35,80% default_wake_function [kernel.vmlinux] - 17,55% handle_mm_fault [kernel.vmlinux] __do_page_fault [kernel.vmlinux] do_page_fault [kernel.vmlinux] - page_fault [kernel.vmlinux] - 52,88% js::gc::Chunk::fetchNextDecommittedArena libxul.so js::gc::Chunk::allocateArena libxul.so js::gc::ArenaLists::allocateFromArenaInline libxul.so js::gc::ArenaLists::refillFreeList<(js::AllowGC)1> libxul.so js::PropertyTree::getChild libxul.so JSObject::getChildProperty libxul.so JSObject::addPropertyInternal<(js::ExecutionMode)0> libxul.so JSObject::putProperty<(js::ExecutionMode)0> libxul.so DefinePropertyOrElement<(js::ExecutionMode)0> libxul.so js::DefineNativeProperty libxul.so JSObject::defineGeneric libxul.so js::DefineFunction libxul.so JS_DefineFunctions libxul.so js::GlobalObject::resolveConstructor libxul.so js::GetBuiltinPrototype libxul.so js::NewDenseUnallocatedArray libxul.so js::XDRObjectLiteral<(js::XDRMode)1> libxul.so js::XDRScript<(js::XDRMode)1> libxul.so js::XDRState<(js::XDRMode)1>::codeScript libxul.so JS_DecodeScript libxul.so ReadCachedScript libxul.so mozJSComponentLoader::ObjectForLocation libxul.so mozJSComponentLoader::ImportInto libxul.so mozJSComponentLoader::Import libxul.so nsXPCComponents_Utils::Import libxul.so NS_InvokeByIndex libxul.so XPCWrappedNative::CallMethod libxul.so XPC_WN_CallMethod libxul.so js::Invoke libxul.so - 47,12% arena_dalloc firefox nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::ShiftData libxul.so SnowWhiteKiller::~SnowWhiteKiller libxul.so nsCycleCollector::FreeSnowWhite libxul.so nsCycleCollector::Shutdown libxul.so nsCycleCollector_shutdown libxul.so mozilla::ShutdownXPCOM libxul.so ScopedXPCOMStartup::~ScopedXPCOMStartup libxul.so XREMain::XRE_main libxul.so XRE_main libxul.so do_main firefox main firefox __libc_start_main libc-2.18.so _start firefox + 9,54% do_futex [kernel.vmlinux] - Arnaldo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-21 14:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel 2015-01-20 18:40 ` Rick Jones 2015-01-21 9:42 ` AKel 2015-01-21 12:40 ` Arnaldo Carvalho de Melo 2015-01-21 14:56 ` Arnaldo Carvalho de Melo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.