From mboxrd@z Thu Jan 1 00:00:00 1970 From: Namhyung Kim Subject: Re: Perf event for Wall-time based sampling? Date: Fri, 19 Sep 2014 14:59:55 +0900 Message-ID: <874mw4m9pg.fsf@sejong.aot.lge.com> References: <2221771.b2oSN5LR6X@milian-kdab2> <2297882.Vc1x1zOfA6@milian-kdab2> <20140918155745.GH2770@kernel.org> <45528931.El8SOGvs6Z@milian-kdab2> <20140918191713.GK2770@kernel.org> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from lgeamrelo01.lge.com ([156.147.1.125]:39975 "EHLO lgeamrelo01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751458AbaISF76 (ORCPT ); Fri, 19 Sep 2014 01:59:58 -0400 In-Reply-To: <20140918191713.GK2770@kernel.org> (Arnaldo Carvalho de Melo's message of "Thu, 18 Sep 2014 16:17:13 -0300") Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Arnaldo Carvalho de Melo Cc: Milian Wolff , linux-perf-users , Ingo Molnar , Joseph Schuchart Hi Arnaldo and Millan, On Thu, 18 Sep 2014 16:17:13 -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Sep 18, 2014 at 06:37:47PM +0200, Milian Wolff escreveu: >> On Thursday 18 September 2014 12:57:45 Arnaldo Carvalho de Melo wrote: >> > Em Thu, Sep 18, 2014 at 05:26:33PM +0200, Milian Wolff escreveu: >> > > On Thursday 18 September 2014 11:51:24 Arnaldo Carvalho de Melo wrote: > >> > >> > > b) The callgraphs are really strange, imo. Different traces are printed >> > > with the same cost, which sounds wrong, no? See e.g. the multiple 44.44% >> > > traces in sched:sched_wakeup. > >> > Try using --no-children in the 'report' command line. > >> Nice, this is very useful. Many thanks! > > npo > >> > > c) Most of the traces point into the kernel, how can I hide these traces >> > > and only concentrate on the user-space? Do I have to grep manually for >> > > [.] ? I > >> > Oh well, for userspace you need to be aware of how callchains are >> > collected, i.e. if your binaries and libraries use >> > -fno-omit-frame-pointer, because if they do you will not get callchains >> > going into userspace, so you will need to specifically ask for 'DWARF' >> > callchains, from 'perf record' documentation: > >> I'm actually aware of that and I did add that option to my initial record >> call, sorry for not being clear here. > >> > >> > This has to be made automated, i.e. the tooling needs to figure out that >> > the binaries used do use %bp for optimization and automagically collect >> > DWARF, but till then, one needs to know about such issues and deal with >> > it. >> >> That would indeed be very welcome. There are multiple "defaults" in perf which >> I find highly confusing. The --no-children above e.g. could/should probably be >> the default, no? Similar, I find it extremely irritating that `perf report -g` > > It was, this is something we've actually been discussing recently: the > change that made --children be the default mode. That is why I added > Namhyung and Ingo to the CC list, so that they become aware of more > reaction to this change. Yeah, we should rethink about changing the default now. Actually I'm okay with the change, Ingo what do you think? > >> defaults to `-g fractal` and not `-g graph`. >> >> 100% foo >> 70% bar >> 70% asdf >> 30% lalala >> 30% baz >> >> is much harder to interpret than >> >> 100% foo >> 70% bar >> 49% asdf >> 21% lalala >> 30% baz I also agree with you. :) > > But the question then is if this is configurable, if not that would be a > first step, i.e. making this possible via some ~/.perfconfig change. Yes, we have record.call-graph and top.call-graph config options now so adding a new report.call-graph option should not be difficult. However I think it'd be better being call-graph.XXX as it can be applied to all other subcommands transparently. What about like below? [call-graph] mode = dwarf dump-size = 8192 print-type = fractal order = callee threshold = 0.5 print-limit = 128 sort-key = function > > Later we could advocate changing the default. Or perhaps provide some > "skins", i.e. config files that could be sourced into ~/.perfconfig so > that perf mimics the decisions of other profilers, with which people are > used to. > > Kinda like making mutt behave like pine (as I did a long time ago), even > if just for a while, till one gets used to the "superior" default way of > doing things of the new tool :-) > >> especially for more involved call chains. It took me quite some time to become >> aware of the ability to pass `-g graph` to get the desired output. KCacheGrind >> e.g. also defaults to something similar to `-g graph` and only optionally >> allows the user to get the "relative to parent" cost of `-g fractal`. >> >> > User space support is something that as you see, is still rough, we need >> > people like you trying it, but while it is rough, people tend to avoid >> > it... :-\ >> >> Yes. But already perf is extremely useful and I use it a lot. I'm also >> actively educating people about using it more. I've talked about it at last >> year's Akademy and Qt Developer Days, and again this year at a profiling >> workshop at Akademy. Please keep up the good work! > > Thanks a lot for doing that! > >> > > tried something like `perf report --parent "main"` but that makes no >> > > difference. > >> > > > I would recommend that you take a look at Brendan Greggs _excellent_ >> > > > tutorials at: > >> > > > http://www.brendangregg.com/perf.html > >> > > > He will explain all this in way more detail than I briefly skimmed >> > > > above. :-) > >> > > I did that already, but Brendan and the other available Perf documentation >> > > mostly concentrates on performance issues in the Kernel. I'm interested >> > > purely in the user space. Perf record with one of the hardware PMU events >> > > works nicely in that case, but one cannot use it to find locks&waits >> > > similar to what VTune offers. > >> > Humm, yeah, you need to figure out how to solve your issue, what I tried >> > was to show what kinds of building blocks you could use to build what >> > you need, but no, there is no ready to use tool for this, that I am >> > aware of. I'm also *very* interest in collecting idle/wait info using perf. Looks like we can somehow use sched:* tracepoints but it requires root privilege though (unless /proc/sys/kernel/perf_event_paranoid being -1). With that restriction however, we might improve perf sched (or even plain perf record/report) to provide such info.. David may have an idea. :) Thanks, Namhyung > >> > For instance, you need to collect scheduler events, then do some >> > scripting, perhaps using perl or python, perhaps using the scripting >> > support that is built into perf already, but yeah, not documented. > >> And also lacking the ability to get callgraphs, if I'm not mistaken. This is >> crucial for my undertaking. Or has this been added in the meantime? > > I guess it was: > > commit 57608cfd8827a74237d264a197722e2c99f72da4 > Author: Joseph Schuchart > Date: Thu Jul 10 13:50:56 2014 +0200 > > perf script: Provide additional sample information on generic events > > To python scripts, including pid, tid, and cpu for which the event > was recorded. > > At the moment, the pointer to the sample struct is passed to > scripts, which seems to be of little use. > > The patch puts this information in dictionaries for easy access by > Python scripts. > > commit 0f5f5bcd112292f14b75750dde7461463bb1c7bb > Author: Joseph Schuchart > Date: Thu Jul 10 13:50:51 2014 +0200 > > perf script: Add callchain to generic and tracepoint events > > This provides valuable information for tracing performance problems. > > Since this change alters the interface for the python scripts, also > adjust the script generation and the provided scripts. > >> >> >> This was also why I asked my initial question, which I want to repeat once >> more: Is there a technical reason to not offer a "timer" software event to >> perf? I'm a complete layman when it comes to Kernel internals, but from a user >> point of view this would be awesome: > >> perf record --call-graph dwarf -e sw-timer -F 100 someapplication > >> This command would then create a timer in the kernel with a 100Hz frequency. >> Whenever it fires, the callgraphs of all threads in $someapplication are >> sampled and written to perf.data. Is this technically not feasible? Or is it >> simply not implemented? > >> I'm experimenting with a libunwind based profiler, and with some ugly signal >> hackery I can now grab backtraces by sending my application SIGUSR1. Based on > > Humm, can't you do the same thing with perf? I.e. you send SIGUSR1 to > your app with the frequency you want, and then hook a 'perf probe' into > your signal... /me tries some stuff, will get back with results... > >> that, I can probably create a profiling tool that fits my needs. I just wonder >> why one cannot do the same with perf. > > - Arnaldo