* Using perf to generate a call stack from a kernel function to the user space caller @ 2015-01-20 17:08 AKel 2015-01-20 18:40 ` Rick Jones 0 siblings, 1 reply; 5+ messages in thread From: AKel @ 2015-01-20 17:08 UTC (permalink / raw) To: linux-perf-users Hi, I'm trying to profile an application which is spending quite a bit of time in the kernel (~40% in do_raw_spin_lock for example). To fully understand what is happening I would like to generate a call stack using perf (or otherwise) to see where the calls to this function are originating from. Is this possible? My hands are slightly tied in that I don't have sudo access to the system I'm working on, so a solution which accounts for this would be greatly appreciated. Even transferring files to/from the system is very difficult so I'm rather restricted in what i can do. Thanks, A ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel @ 2015-01-20 18:40 ` Rick Jones 2015-01-21 9:42 ` AKel 0 siblings, 1 reply; 5+ messages in thread From: Rick Jones @ 2015-01-20 18:40 UTC (permalink / raw) To: AKel, linux-perf-users On 01/20/2015 09:08 AM, AKel wrote: > Hi, > > I'm trying to profile an application which is spending quite a bit of time > in the kernel (~40% in do_raw_spin_lock for example). To fully understand > what is happening I would like to generate a call stack using perf (or > otherwise) to see where the calls to this function are originating from. > Is this possible? My hands are slightly tied in that I don't have sudo > access to the system I'm working on, so a solution which accounts for this > would be greatly appreciated. Even transferring files to/from the system > is very difficult so I'm rather restricted in what i can do. I suspect you will have to resolve the sudo issue by getting sudo. Ignoring that for the time-being, it sounds like you could use the call-graph functionality of perf record/report/top? rick jones ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-20 18:40 ` Rick Jones @ 2015-01-21 9:42 ` AKel 2015-01-21 12:40 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 5+ messages in thread From: AKel @ 2015-01-21 9:42 UTC (permalink / raw) To: linux-perf-users Thanks Rick for your reply. Unfortunately I'm in the big bad corporate world where a simple minion such as myself is not granted sudo access. I've tried passing the callgraph (-g) flag to perf, however I can only recover the call stack within the kernel, it begins at what I suspect is the kernel entry point. Ideally I could generate a complete call stack. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-21 9:42 ` AKel @ 2015-01-21 12:40 ` Arnaldo Carvalho de Melo 2015-01-21 14:56 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 5+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-01-21 12:40 UTC (permalink / raw) To: AKel; +Cc: linux-perf-users Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu: > Thanks Rick for your reply. Unfortunately I'm in the big bad corporate > world where a simple minion such as myself is not granted sudo access. That should not be a problem in most of the cases for profiling your own apps. > I've tried passing the callgraph (-g) flag to perf, however I can only > recover the call stack within the kernel, it begins at what I suspect is Because it will try, by default, to collect %bp (frame pointer) callchains, which unfortunately is not usually enabled for user space programs, that use the %bp register for other purposes. > the kernel entry point. Ideally I could generate a complete call stack. You can try using the DWARF CFI (Call Frame Information), by using instead: $ perf record --call-graph dwarf firefox Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few callgraphs: $ perf report --no-children # expand callgraphs, press P, press q $ head -45 perf.hist.0 - 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv _ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE _ZN7mozilla5image11ScaleRunner3RunEv _ZN8nsThread16ProcessNextEventEbPb _Z19NS_ProcessNextEventP9nsIThreadb _ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE _ZN11MessageLoop3RunEv _ZN8nsThread10ThreadFuncEPv _pt_root start_thread __clone - 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back - __memmove_ssse3_back - 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_ - 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev _ZN7mozilla3net10CacheIndex12ParseRecordsEv _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult _ZN7mozilla3net9ReadEvent3RunEv _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj _ZN7mozilla3net13CacheIOThread10ThreadFuncEv _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv _pt_root start_thread __clone - 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev _ZN7mozilla3net10CacheIndex12ParseRecordsEv _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult _ZN7mozilla3net9ReadEvent3RunEv _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj _ZN7mozilla3net13CacheIOThread10ThreadFuncEv _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv _pt_root start_thread __clone + 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60 + 1,79% firefox libz.so.1.2.8 [.] inflate_fast + 1,76% Cache2 I/O libxul.so [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60 + 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock + 1,37% firefox firefox [.] arena_dalloc $ These C++ methods that needs unmangling, and I have the firefox-debuginfo package matching the binary... Have to check this. Anyway, either you have your binaries (your main binary and the libraries it uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can use '-g', that, in its default form is equivalent to '--call-graph fp', or you will have to ask that the kernel collects stack dumps to then use CFI to do the callchains, i.e. use '--call-graph dwarf'. You may want to state how much stack is collected when using --call-graph dwarf, please look at the 'record' help: -g enables call-graph recording --call-graph <mode[,dump_size]> setup and enables call-graph (stack chain/backtrace) recording: fp dwarf Ah, here is another example, this time crossing from kernel to userspace, all the way back from a spin lock operation to the memory allocation routines used by firefox, after I pressed -> and zoomed into the kernel DSO: [acme@zoo linux]$ head -50 perf.hist.1 + 0,32% firefox [k] clear_page_c_e + 0,26% firefox [k] copy_user_enhanced_fast_string + 0,26% firefox [k] page_fault + 0,20% firefox [k] get_page_from_freelist + 0,20% firefox [k] __list_del_entry + 0,20% firefox [k] mem_cgroup_page_lruvec + 0,16% firefox [k] lookup_page_cgroup_used - 0,14% firefox [k] _raw_spin_lock - _raw_spin_lock - 37,73% try_to_wake_up - 64,20% wake_up_state wake_futex futex_wake do_futex sys_futex system_call - __lll_unlock_wake 37,39% 0x7f5e81b10400 32,07% 0x7f5e63e81bd0 - 30,54% arena_malloc malloc _ZN14nsStringBuffer5AllocEm _ZN18nsAString_internal10MutatePrepEjPPDsPj _ZN18nsAString_internal19ReplacePrepInternalEjjjj + 35,80% default_wake_function - 17,55% handle_mm_fault __do_page_fault do_page_fault - page_fault + 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv - 47,12% arena_dalloc _ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm _ZN15SnowWhiteKillerD2Ev _ZN16nsCycleCollector13FreeSnowWhiteEb _ZN16nsCycleCollector8ShutdownEv _Z25nsCycleCollector_shutdownv _ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager _ZN18ScopedXPCOMStartupD2Ev _ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData XRE_main _ZL7do_mainiPPcP7nsIFile main __libc_start_main _start + 9,54% do_futex + 9,36% do_cow_fault + 9,33% tick_do_update_jiffies64 + 8,72% futex_wait_setup + 7,62% unix_stream_recvmsg + 0,14% firefox [k] mem_cgroup_begin_page_stat [acme@zoo linux]$ - Arnaldo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller 2015-01-21 12:40 ` Arnaldo Carvalho de Melo @ 2015-01-21 14:56 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 5+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-01-21 14:56 UTC (permalink / raw) To: AKel; +Cc: linux-perf-users Em Wed, Jan 21, 2015 at 09:40:21AM -0300, Arnaldo Carvalho de Melo escreveu: > Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few > callgraphs: > $ perf report --no-children # expand callgraphs, press P, press q > $ head -45 perf.hist.0 > - 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb > _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb > _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv > _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv <SNIP> > These C++ methods that needs unmangling, and I have the firefox-debuginfo > package matching the binary... Have to check this. [acme@zoo linux]$ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory `/home/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build config/Makefile:597: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling Auto-detecting system features: ... dwarf: [ on ] ... glibc: [ on ] ... gtk2: [ on ] ... libaudit: [ on ] ... libbfd: [ OFF ] after installing binutils-devel it gets symbol demangling: - 3,47% Image Scaler libxul.so [.] skia::BGRAConvolve2D skia::BGRAConvolve2D skia::ImageOperations::ResizeBasic skia::ImageOperations::Resize skia::ImageOperations::Resize mozilla::gfx::Scale mozilla::image::ScaleRunner::Run nsThread::ProcessNextEvent NS_ProcessNextEvent mozilla::ipc::MessagePumpForNonMainThreads::Run MessageLoop::Run nsThread::ThreadFunc _pt_root start_thread __clone + 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back + 1,79% firefox libz.so.1.2.8 [.] inflate_fast + 1,76% Cache2 I/O libxul.so [.] nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::RemoveElement<mozilla::net::CacheIndexRecord*, nsDefaultComparator<mozilla::net::CacheIndexRecord*, mozilla::net::CacheIndexRecord*> > + 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock + 1,37% firefox firefox [.] arena_dalloc + 1,27% firefox libpthread-2.18.so [.] pthread_mutex_unlock + 1,02% firefox firefox [.] arena_malloc + 0,68% firefox libxul.so [.] PLDHashTable::SearchTable + 0,58% firefox libxul.so [.] js::AtomizeChars<unsigned char> + 0,55% firefox libxul.so [.] Interpret + 0,52% firefox libxul.so [.] js::ObjectImpl::nativeLookup + 0,49% firefox libc-2.18.so [.] __memcpy_sse2_unaligned + 0,47% firefox libxul.so [.] NewObject + 0,45% firefox p11-kit-trust.so [.] lookup_or_create_bucket + 0,41% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_lock + 0,39% firefox libxul.so [.] js::EmptyShape::getInitialShape + 0,38% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_unlock + 0,35% ImageDecoder #2 libz.so.1.2.8 [.] inflate_fast + 0,35% firefox libxul.so [.] mozilla::gfx::AlphaBoxBlur::BoxBlur_SSE2 + 0,34% firefox ld-2.18.so [.] do_lookup_x + 0,34% firefox libxul.so [.] JSObject::putProperty<(js::ExecutionMode)0> + 0,34% firefox libxul.so [.] js::ShapeTable::search + 0,33% firefox libxul.so [.] js::baseops::GetProperty - 0,32% firefox [kernel.vmlinux] [k] clear_page_c_e clear_page_c_e - __alloc_pages_nodemask - 97,50% alloc_pages_vma - 95,75% handle_mm_fault __do_page_fault do_page_fault - page_fault - 19,82% __memcpy_sse2_unaligned - 35,58% _XRead _XReadPad XGetImage _get_image_surface _cairo_xlib_surface_acquire_source_image _cairo_surface_acquire_source_image _pixman_image_for_pattern _clip_and_composite_boxes _cairo_image_surface_paint _cairo_surface_paint _cairo_gstate_paint INT__moz_cairo_paint _moz_cairo_paint_with_alpha mozilla::gfx::DrawTargetCairo::DrawSurface mozilla::image::imgFrame::LockImageData mozilla::image::RasterImage::RequestScale mozilla::image::RasterImage::OptimalImageSizeForDest DrawImageInternal nsLayoutUtils::DrawBackgroundImage nsImageRenderer::Draw nsImageRenderer::DrawBackground nsCSSRendering::PaintBackgroundWithSC nsCSSRendering::PaintBackground nsDisplayBackgroundImage::PaintInternal mozilla::FrameLayerBuilder::PaintItems mozilla::FrameLayerBuilder::DrawThebesLayer mozilla::layers::BasicThebesLayer::PaintBuffer mozilla::layers::BasicThebesLayer::Validate mozilla::layers::BasicContainerLayer::Validate mozilla::layers::BasicContainerLayer::Validate mozilla::layers::BasicLayerManager::EndTransactionInternal nsDisplayList::PaintForFrame - 22,39% js::jit::Assembler::executableCopy js::jit::JitCode::copyFrom js::jit::Linker::newCode<(js::AllowGC)1> js::jit::ICStubCompiler::getStubCode js::jit::ICCallScriptedCompiler::getStub js::jit::TryAttachCallStub js::jit::DoCallFallback 0x7f5e847ffdb0 - 21,73% js::NewObjectWithClassProtoCommon js::jit::NewInitObject js::jit::DoNewObject 0x7f5e84800ec4 + 20,30% updatewindow + 15,70% arena_dalloc + 12,82% arena_malloc + 12,73% js::gc::Chunk::fetchNextDecommittedArena + 7,31% skb_copy_datagram_iovec + 7,04% sse2_blt + 4,45% NS_CopySegmentToBuffer + 4,18% js::InflateString + 4,18% mozilla::VectorBase<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy, js::Vector<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy> >::growStorageBy + 4,13% mozilla::image::imgFrame::Init + 3,86% _cairo_image_surface_span + 3,78% sse2_fill + 4,25% do_wp_page + 2,50% alloc_pages_current + 0,32% ImageDecoder #3 libz.so.1.2.8 [.] inflate_fast And if you press 'V' you get the DSO name right after the symbol name: + 0,32% firefox [k] clear_page_c_e + 0,26% firefox [k] copy_user_enhanced_fast_string + 0,26% firefox [k] page_fault + 0,20% firefox [k] get_page_from_freelist + 0,20% firefox [k] __list_del_entry + 0,20% firefox [k] mem_cgroup_page_lruvec + 0,16% firefox [k] lookup_page_cgroup_used - 0,14% firefox [k] _raw_spin_lock - _raw_spin_lock [kernel.vmlinux] - 37,73% try_to_wake_up [kernel.vmlinux] - 64,20% wake_up_state [kernel.vmlinux] wake_futex [kernel.vmlinux] futex_wake [kernel.vmlinux] do_futex [kernel.vmlinux] sys_futex [kernel.vmlinux] system_call [kernel.vmlinux] - __lll_unlock_wake libpthread-2.18.so 37,39% 0x7f5e81b10400 libxul.so 32,07% 0x7f5e63e81bd0 unknown - 30,54% arena_malloc firefox malloc firefox nsStringBuffer::Alloc libxul.so nsAString_internal::MutatePrep libxul.so nsAString_internal::ReplacePrepInternal libxul.so + 35,80% default_wake_function [kernel.vmlinux] - 17,55% handle_mm_fault [kernel.vmlinux] __do_page_fault [kernel.vmlinux] do_page_fault [kernel.vmlinux] - page_fault [kernel.vmlinux] - 52,88% js::gc::Chunk::fetchNextDecommittedArena libxul.so js::gc::Chunk::allocateArena libxul.so js::gc::ArenaLists::allocateFromArenaInline libxul.so js::gc::ArenaLists::refillFreeList<(js::AllowGC)1> libxul.so js::PropertyTree::getChild libxul.so JSObject::getChildProperty libxul.so JSObject::addPropertyInternal<(js::ExecutionMode)0> libxul.so JSObject::putProperty<(js::ExecutionMode)0> libxul.so DefinePropertyOrElement<(js::ExecutionMode)0> libxul.so js::DefineNativeProperty libxul.so JSObject::defineGeneric libxul.so js::DefineFunction libxul.so JS_DefineFunctions libxul.so js::GlobalObject::resolveConstructor libxul.so js::GetBuiltinPrototype libxul.so js::NewDenseUnallocatedArray libxul.so js::XDRObjectLiteral<(js::XDRMode)1> libxul.so js::XDRScript<(js::XDRMode)1> libxul.so js::XDRState<(js::XDRMode)1>::codeScript libxul.so JS_DecodeScript libxul.so ReadCachedScript libxul.so mozJSComponentLoader::ObjectForLocation libxul.so mozJSComponentLoader::ImportInto libxul.so mozJSComponentLoader::Import libxul.so nsXPCComponents_Utils::Import libxul.so NS_InvokeByIndex libxul.so XPCWrappedNative::CallMethod libxul.so XPC_WN_CallMethod libxul.so js::Invoke libxul.so - 47,12% arena_dalloc firefox nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::ShiftData libxul.so SnowWhiteKiller::~SnowWhiteKiller libxul.so nsCycleCollector::FreeSnowWhite libxul.so nsCycleCollector::Shutdown libxul.so nsCycleCollector_shutdown libxul.so mozilla::ShutdownXPCOM libxul.so ScopedXPCOMStartup::~ScopedXPCOMStartup libxul.so XREMain::XRE_main libxul.so XRE_main libxul.so do_main firefox main firefox __libc_start_main libc-2.18.so _start firefox + 9,54% do_futex [kernel.vmlinux] - Arnaldo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-21 14:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel 2015-01-20 18:40 ` Rick Jones 2015-01-21 9:42 ` AKel 2015-01-21 12:40 ` Arnaldo Carvalho de Melo 2015-01-21 14:56 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).