From mboxrd@z Thu Jan 1 00:00:00 1970 From: Milian Wolff Subject: Re: Perf event for Wall-time based sampling? Date: Fri, 19 Sep 2014 10:11:21 +0200 Message-ID: <5770573.p6hpIeRPQV@milian-kdab2> References: <2221771.b2oSN5LR6X@milian-kdab2> <541B3DED.1010306@gmail.com> <20140918203625.GM2770@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Return-path: Received: from dd17628.kasserver.com ([85.13.138.83]:60450 "EHLO dd17628.kasserver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751223AbaISILZ (ORCPT ); Fri, 19 Sep 2014 04:11:25 -0400 In-Reply-To: <20140918203625.GM2770@kernel.org> Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Arnaldo Carvalho de Melo Cc: David Ahern , linux-perf-users , Namhyung Kim , Ingo Molnar , Joseph Schuchart On Thursday 18 September 2014 17:36:25 Arnaldo Carvalho de Melo wrote: > Em Thu, Sep 18, 2014 at 02:17:49PM -0600, David Ahern escreveu: > > On 9/18/14, 1:17 PM, Arnaldo Carvalho de Melo wrote: > > >>This was also why I asked my initial question, which I want to repeat > > >>once > > >> > > >>>more: Is there a technical reason to not offer a "timer" software event > > >>>to > > >>>perf? I'm a complete layman when it comes to Kernel internals, but from > > >>>a user point of view this would be awesome: > > >>> > > >>>perf record --call-graph dwarf -e sw-timer -F 100 someapplication > > >>> > > >>>This command would then create a timer in the kernel with a 100Hz > > >>>frequency. Whenever it fires, the callgraphs of all threads in > > >>>$someapplication are sampled and written to perf.data. Is this > > >>>technically not feasible? Or is it simply not implemented? > > >>>I'm experimenting with a libunwind based profiler, and with some ugly > > >>>signal hackery I can now grab backtraces by sending my application > > >>>SIGUSR1. Based on> > > > >Humm, can't you do the same thing with perf? I.e. you send SIGUSR1 to > > >your app with the frequency you want, and then hook a 'perf probe' into > > >your signal... /me tries some stuff, will get back with results... That is actually a very good idea. With the more powerful scripting abilities in perf now, that could/should do the job indeed. I'll also try this out. > > Current profiling options with perf require the process to be running. > > What > > Ok, so you want to see what is the wait channel and unwind the stack > from there? Is that the case? I.e. again, a sysrq-t equivalent? Hey again :) If I'm not mistaken, I could not yet bring my point across. The final goal is to profile both, wait time _and_ CPU time, combined! By sampling a userspace program with a constant frequency and some statistics one can get extremely useful information out of the data. I want to use perf to automate the process that Mike Dunlavey explains here: http://stackoverflow.com/a/378024/35250 So yes, I want to see the wait channel and unwind from there, if the process is waiting. Otherwise just do the normal unwinding you'd do when any other perf event occurs. > > Milian want is to grab samples every timer expiration even if process is > > not running. > > What for? And by "grab samples" you want to know where it is waiting for > something, together with its callchain? See above. If I can sample the callchain every N ms, preferrably in a per- thread manner, I can create tools which find the slowest userspace functions. This is based on "inclusive" cost, which is made up out of CPU time _and_ wait time combined. With it one will find all of the following: - CPU hotspots - IO wait time - lock contention in a multi-threaded application > > Any limitations that would prevent doing this with a sw event? e.g, mimic > > task-clock just don't disable the timer when the task is scheduled out. > > I'm just trying to figure out if what people want is a complete > backtrace of all threads in a process, no matter what they are doing, at > some given time, i.e. at "sysrq-t" time, is that the case? Yes, that sounds correct to me. I tried it out on my system, but the output in dmesg is far too large and I could not find the call stack information of my test application while it slept. Looking at the other backtraces I see there, I'm not sure whether sysrq-t only outputs a kernel-backtrace? Or maybe it's because I'm missing debug symbols for these applications? [167576.049113] mysqld S 0000000000000000 0 17023 1 0x00000000 [167576.049115] ffff8801222a7ad0 0000000000000082 ffff8800aa8c9460 0000000000014580 [167576.049117] ffff8801222a7fd8 0000000000014580 ffff8800aa8c9460 ffff8801222a7a18 [167576.049119] ffffffff812a2640 ffff8800bb975c40 ffff880237cd4ef0 ffff8801222a7b20 [167576.049121] Call Trace: [167576.049124] [] ? cpumask_next_and+0x30/0x50 [167576.049126] [] ? add_wait_queue+0x44/0x50 [167576.049128] [] schedule+0x29/0x70 [167576.049130] [] schedule_hrtimeout_range_clock+0x3d/0x60 [167576.049132] [] schedule_hrtimeout_range+0x13/0x20 [167576.049134] [] poll_schedule_timeout+0x49/0x70 [167576.049136] [] do_sys_poll+0x4be/0x570 [167576.049138] [] ? __alloc_pages_nodemask+0x180/0xc20 [167576.049140] [] ? alloc_pid+0x2e/0x4c0 [167576.049142] [] ? poll_select_copy_remaining+0x150/0x150 [167576.049145] [] ? cpuacct_charge+0x50/0x60 [167576.049147] [] ? kmem_cache_alloc+0x143/0x170 [167576.049149] [] ? alloc_pid+0x2e/0x4c0 [167576.049151] [] ? __enqueue_entity+0x78/0x80 [167576.049153] [] ? enqueue_entity+0x24e/0xaa0 [167576.049155] [] ? cpumask_next_and+0x30/0x50 [167576.049158] [] ? enqueue_task_fair+0x10d/0x5b0 [167576.049160] [] ? native_smp_send_reschedule+0x4d/0x70 [167576.049162] [] ? resched_task+0xc0/0xd0 [167576.049165] [] ? check_preempt_curr+0x57/0x90 [167576.049167] [] ? wake_up_new_task+0x11b/0x1c0 [167576.049169] [] ? do_fork+0x136/0x3f0 [167576.049171] [] ? __fget_light+0x25/0x70 [167576.049173] [] SyS_poll+0x97/0x120 [167576.049176] [] ? system_call_fastpath+0x16/0x1b [167576.049178] [] system_call_fastpath+0x16/0x1b I'm interested in the the other side of the fence. The backtrace could/should stop at system_call_fastpath. Bye -- Milian Wolff mail@milianw.de http://milianw.de