* [perf] howto switch from pfmon
@ 2009-06-22 20:54 Brice Goglin
2009-06-23 12:12 ` Andi Kleen
2009-06-23 13:14 ` Ingo Molnar
0 siblings, 2 replies; 41+ messages in thread
From: Brice Goglin @ 2009-06-22 20:54 UTC (permalink / raw)
To: Peter Zijlstra, paulus, mingo, LKML
Hello,
I am trying to play with perfcounters in current git (actually in latest
mmotm). I'd like to reproduce what I previously did with pfmon, but I
couldn't so far.
Something like
pfmon --follow-exec 'foobar' -e
CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1
-- <shell script>
gives the number of memory accesses to dram node #0 and #1 for all
processes whose name matches 'foobar'.
So there are several questions here:
1) is it possible to specify counter names like the above or do we have
to use raw counter numbers? I tried raw numbers from [1] without
success. How am I supposed to find and specify these raw numbers?
2) how do we specify "subevents"?
3) is there anything similar to --follow-exec, or --follow-pthreads for
getting separated outputs for each thread?
I guess there are still a lot of things on the TODOlist but I'd like to
understand a bit more where things are going. Sorry I didn't read all
the archives about this, there are way too many of them recently :)
thanks,
Brice
[1]
https://aiya.ms.mff.cuni.cz/svn/rip/trunk/doc/devel/native_events_barcelona.txt
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [perf] howto switch from pfmon 2009-06-22 20:54 [perf] howto switch from pfmon Brice Goglin @ 2009-06-23 12:12 ` Andi Kleen 2009-06-23 12:23 ` Peter Zijlstra 2009-06-23 13:57 ` Ingo Molnar 2009-06-23 13:14 ` Ingo Molnar 1 sibling, 2 replies; 41+ messages in thread From: Andi Kleen @ 2009-06-23 12:12 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, mingo, LKML Brice Goglin <Brice.Goglin@inria.fr> writes: > Hello, > > I am trying to play with perfcounters in current git (actually in latest > mmotm). I'd like to reproduce what I previously did with pfmon, but I > couldn't so far. > > Something like > pfmon --follow-exec 'foobar' -e > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1 > -- <shell script> > gives the number of memory accesses to dram node #0 and #1 for all > processes whose name matches 'foobar'. My understanding based on recent emails on the topic is that the perfctr gods decreed you are not to do any of this because they cannot think of a use case for it, therefore none exist. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 12:12 ` Andi Kleen @ 2009-06-23 12:23 ` Peter Zijlstra 2009-06-23 13:57 ` Ingo Molnar 1 sibling, 0 replies; 41+ messages in thread From: Peter Zijlstra @ 2009-06-23 12:23 UTC (permalink / raw) To: Andi Kleen; +Cc: Brice Goglin, paulus, mingo, LKML On Tue, 2009-06-23 at 14:12 +0200, Andi Kleen wrote: > Brice Goglin <Brice.Goglin@inria.fr> writes: > > > Hello, > > > > I am trying to play with perfcounters in current git (actually in latest > > mmotm). I'd like to reproduce what I previously did with pfmon, but I > > couldn't so far. > > > > Something like > > pfmon --follow-exec 'foobar' -e > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1 > > -- <shell script> > > gives the number of memory accesses to dram node #0 and #1 for all > > processes whose name matches 'foobar'. > > My understanding based on recent emails on the topic is that the > perfctr gods decreed you are not to do any of this because they cannot > think of a use case for it, therefore none exist. I wouldn't put it like that. But we haven't gotten around to implementing uncore pmu stuff -- assuming that is what was meant. What would be accurate is to say that we think uncore is a lot less interesting that a lot of other pmu features. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 12:12 ` Andi Kleen 2009-06-23 12:23 ` Peter Zijlstra @ 2009-06-23 13:57 ` Ingo Molnar 1 sibling, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 13:57 UTC (permalink / raw) To: Andi Kleen; +Cc: Brice Goglin, Peter Zijlstra, paulus, LKML * Andi Kleen <andi@firstfloor.org> wrote: > Brice Goglin <Brice.Goglin@inria.fr> writes: > > > Hello, > > > > I am trying to play with perfcounters in current git (actually in latest > > mmotm). I'd like to reproduce what I previously did with pfmon, but I > > couldn't so far. > > > > Something like > > pfmon --follow-exec 'foobar' -e > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1 > > -- <shell script> > > gives the number of memory accesses to dram node #0 and #1 for all > > processes whose name matches 'foobar'. > > My understanding based on recent emails on the topic is that the > perfctr gods decreed you are not to do any of this because they > cannot think of a use case for it, therefore none exist. You are working for Intel, right? Is the trolling of AMD related threads now an officially sanctioned activity by Intel, or do you do it out of personal motivation, in your free time? I'd really like to know, because what you do here is quite unprofessional and quite a distraction. [ Btw., 'perfctr' is the name of another project, the one you wanted to attack here is called 'perfcounters'. ] Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-22 20:54 [perf] howto switch from pfmon Brice Goglin 2009-06-23 12:12 ` Andi Kleen @ 2009-06-23 13:14 ` Ingo Molnar 2009-06-23 13:22 ` Peter Zijlstra ` (3 more replies) 1 sibling, 4 replies; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 13:14 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Hello, > > I am trying to play with perfcounters in current git (actually in > latest mmotm). I'd like to reproduce what I previously did with > pfmon, but I couldn't so far. > > Something like > pfmon --follow-exec 'foobar' -e > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1 > -- <shell script> > gives the number of memory accesses to dram node #0 and #1 for all > processes whose name matches 'foobar'. > > So there are several questions here: > 1) is it possible to specify counter names like the above or do we have > to use raw counter numbers? I tried raw numbers from [1] without > success. How am I supposed to find and specify these raw numbers? > 2) how do we specify "subevents"? > 3) is there anything similar to --follow-exec, or --follow-pthreads for > getting separated outputs for each thread? > > I guess there are still a lot of things on the TODOlist but I'd > like to understand a bit more where things are going. Sorry I > didn't read all the archives about this, there are way too many of > them recently :) Yeah, there's indeed still a lot on the TODO list :-) CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE is a Barcelona hardware event, so if you know that it maps to raw ID 0x100000e0 then you can always extend the events that 'perf' knows about via raw events: $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 Time: 0.186 Performance counter stats for './hackbench 10': 4381248335 cycles 1964394846 instructions # 0.448 IPC 838 raw 0x1000ffe0 0.215382037 seconds time elapsed. That 'r1000ffe0' is the raw event. You can also do a profile with such events: perf record -f -e r1000ffe0 ./hackbench 10 and look at it via 'perf report'. Figuring out raw codes is certainly avoidable, we could probably integrate all the oprofile (and PAPI) event names into perf too, from the /usr/share/oprofile/ event lists perhaps - for easier migration for those who got used to those event names. It also gives a wider set of events - which is useful if you got used to any specific name. The Barcelona events are listed in listed in section 3.14 of "BIOS and Kernel Developer's Guide for AMD Familiy 10h Processors", that's where all the projects take these symbols from. If you want to contribute then creating such tables for 'perf', for model-specific events would certainly be useful. [ Note, there's no need to specify any --follow-* flags as that is implicit in 'perf'. (and you'll probably also notice that perf stat is a lot faster at following fast-forking or context-switching workloads than is pfmon, because it's not ptrace based.) ] And please let us know if you see any weirdness/difficulty while using 'perf' or if you just notice some quirky thing in the tool. Thanks, Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 13:14 ` Ingo Molnar @ 2009-06-23 13:22 ` Peter Zijlstra 2009-06-23 13:38 ` Ingo Molnar 2009-06-23 13:25 ` Ingo Molnar ` (2 subsequent siblings) 3 siblings, 1 reply; 41+ messages in thread From: Peter Zijlstra @ 2009-06-23 13:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: Brice Goglin, paulus, LKML On Tue, 2009-06-23 at 15:14 +0200, Ingo Molnar wrote: > * Brice Goglin <Brice.Goglin@inria.fr> wrote: > > > Hello, > > > > I am trying to play with perfcounters in current git (actually in > > latest mmotm). I'd like to reproduce what I previously did with > > pfmon, but I couldn't so far. > > > > Something like > > pfmon --follow-exec 'foobar' -e > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1 > > -- <shell script> > > gives the number of memory accesses to dram node #0 and #1 for all > > processes whose name matches 'foobar'. > > > > So there are several questions here: > > 1) is it possible to specify counter names like the above or do we have > > to use raw counter numbers? I tried raw numbers from [1] without > > success. How am I supposed to find and specify these raw numbers? > > 2) how do we specify "subevents"? > > 3) is there anything similar to --follow-exec, or --follow-pthreads for > > getting separated outputs for each thread? > > > > I guess there are still a lot of things on the TODOlist but I'd > > like to understand a bit more where things are going. Sorry I > > didn't read all the archives about this, there are way too many of > > them recently :) > > Yeah, there's indeed still a lot on the TODO list :-) > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE is a Barcelona hardware event, > so if you know that it maps to raw ID 0x100000e0 then you can always > extend the events that 'perf' knows about via raw events: > > $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 > Time: 0.186 > > Performance counter stats for './hackbench 10': > > 4381248335 cycles > 1964394846 instructions # 0.448 IPC > 838 raw 0x1000ffe0 > > 0.215382037 seconds time elapsed. Just to clarify, The event code is 1E0h, and Ingo used a FFh unit mask. These are combined using the arch masks below: #define K7_EVNTSEL_EVENT_MASK 0x7000000FFULL #define K7_EVNTSEL_UNIT_MASK 0x00000FF00ULL to form the raw event code used: 0x1000ffe0 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 13:22 ` Peter Zijlstra @ 2009-06-23 13:38 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 13:38 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Brice Goglin, paulus, LKML * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Tue, 2009-06-23 at 15:14 +0200, Ingo Molnar wrote: > > * Brice Goglin <Brice.Goglin@inria.fr> wrote: > > > > > Hello, > > > > > > I am trying to play with perfcounters in current git (actually in > > > latest mmotm). I'd like to reproduce what I previously did with > > > pfmon, but I couldn't so far. > > > > > > Something like > > > pfmon --follow-exec 'foobar' -e > > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_0,CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE:LOCAL_TO_1 > > > -- <shell script> > > > gives the number of memory accesses to dram node #0 and #1 for all > > > processes whose name matches 'foobar'. > > > > > > So there are several questions here: > > > 1) is it possible to specify counter names like the above or do we have > > > to use raw counter numbers? I tried raw numbers from [1] without > > > success. How am I supposed to find and specify these raw numbers? > > > 2) how do we specify "subevents"? > > > 3) is there anything similar to --follow-exec, or --follow-pthreads for > > > getting separated outputs for each thread? > > > > > > I guess there are still a lot of things on the TODOlist but I'd > > > like to understand a bit more where things are going. Sorry I > > > didn't read all the archives about this, there are way too many of > > > them recently :) > > > > Yeah, there's indeed still a lot on the TODO list :-) > > > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE is a Barcelona hardware event, > > so if you know that it maps to raw ID 0x100000e0 then you can always > > extend the events that 'perf' knows about via raw events: > > > > $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 > > Time: 0.186 > > > > Performance counter stats for './hackbench 10': > > > > 4381248335 cycles > > 1964394846 instructions # 0.448 IPC > > 838 raw 0x1000ffe0 > > > > 0.215382037 seconds time elapsed. > > Just to clarify, The event code is 1E0h, and Ingo used a FFh unit mask. > These are combined using the arch masks below: > > #define K7_EVNTSEL_EVENT_MASK 0x7000000FFULL > #define K7_EVNTSEL_UNIT_MASK 0x00000FF00ULL > > to form the raw event code used: 0x1000ffe0 Yes. The individual node mappings are 01, 02 .. 80 - ff is 'all 8 nodes'. Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 13:14 ` Ingo Molnar 2009-06-23 13:22 ` Peter Zijlstra @ 2009-06-23 13:25 ` Ingo Molnar 2009-06-23 13:47 ` Ingo Molnar 2009-06-23 14:21 ` Brice Goglin 3 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 13:25 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Ingo Molnar <mingo@elte.hu> wrote: > > I guess there are still a lot of things on the TODOlist but I'd > > like to understand a bit more where things are going. Sorry I > > didn't read all the archives about this, there are way too many > > of them recently :) > > Yeah, there's indeed still a lot on the TODO list :-) > > CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE is a Barcelona hardware event, > so if you know that it maps to raw ID 0x100000e0 then you can > always extend the events that 'perf' knows about via raw events: > > $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 Note, beyond using raw events, if you are interested in profiling out 'locality badness' of your app, you are probably quite well served with the default metrics on Barcelona as well: $ perf stat ~/hackbench 10 Time: 0.205 Performance counter stats for '/home/mingo/hackbench 10': 2187.328436 task-clock-msecs # 3.315 CPUs 54554 context-switches # 0.025 M/sec 1160 CPU-migrations # 0.001 M/sec 17755 page-faults # 0.008 M/sec 4995437535 cycles # 2283.808 M/sec 2150881875 instructions # 0.431 IPC 644099534 cache-references # 294.469 M/sec 8516562 cache-misses # 3.894 M/sec 0.659895237 seconds time elapsed. The cache-misses event is sufficiently well-represented to be meaningful to profile based on it. Raw DRAM access stats can be useful too - but they are generally layered much later and your app can hurt already flip-flopping its working set, without hitting too hard on the DRAM channels. So perhaps 'cache-misses' is a good first-level approximation metric to measure and profile along. You can get a good (last-level-)cache-misses profile using the auto-freq counters: perf record -e cache-misses -F 10000 ./your-app The '-F 10000' tells the kernel to do 10 KHz sampling of your-app, regardless of how frequent cache-misses are. The tools (perf report) will take the weight of events into account, so it's all well-normalized between the functions. So you dont need to specify the 'sampling interval' by hand to get a sufficient number of samples, you just specify a sampling frequency - and the perfcounters subsystem takes care of the rest. Also, your system wont over-sample nor under-sample if your workload idles around occasionally. Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 13:14 ` Ingo Molnar 2009-06-23 13:22 ` Peter Zijlstra 2009-06-23 13:25 ` Ingo Molnar @ 2009-06-23 13:47 ` Ingo Molnar 2009-06-23 14:00 ` Brice Goglin 2009-06-23 14:21 ` Brice Goglin 3 siblings, 1 reply; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 13:47 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Ingo Molnar <mingo@elte.hu> wrote: > $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 > Time: 0.186 Correction: that should be r10000ffe0. Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 13:47 ` Ingo Molnar @ 2009-06-23 14:00 ` Brice Goglin 2009-06-23 14:36 ` Ingo Molnar 0 siblings, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-06-23 14:00 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > >> $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 >> Time: 0.186 >> > > Correction: that should be r10000ffe0. > Oh thanks a lot, it seems to work now! One strange thing I noticed: sometimes perf reports that there were some accesses to target numa nodes 4-7 while my box only has 4 numa nodes: If I request counters only for the non-existing target numa nodes (4-7, with -e r1000010e0 -e r1000020e0 -e r1000040e0 -e r1000080e0), I always get 4 zeros. But if I mix some couinters from the existing nodes (0-3) with some counters from non-existing nodes (4-7), the non-existing ones report some small but non-empty values. Does it ring any bell? Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 14:00 ` Brice Goglin @ 2009-06-23 14:36 ` Ingo Molnar 2009-06-23 15:22 ` Brice Goglin 0 siblings, 1 reply; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 14:36 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > >> $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10 > >> Time: 0.186 > >> > > > > Correction: that should be r10000ffe0. > > Oh thanks a lot, it seems to work now! btw., it might make sense to expose NUMA inbalance via generic enumeration. Right now we have: PERF_COUNT_HW_CPU_CYCLES = 0, PERF_COUNT_HW_INSTRUCTIONS = 1, PERF_COUNT_HW_CACHE_REFERENCES = 2, PERF_COUNT_HW_CACHE_MISSES = 3, PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4, PERF_COUNT_HW_BRANCH_MISSES = 5, PERF_COUNT_HW_BUS_CYCLES = 6, plus we have cache stats: * Generalized hardware cache counters: * * { L1-D, L1-I, LLC, ITLB, DTLB, BPU } x * { read, write, prefetch } x * { accesses, misses } NUMA is here to stay, and expressing local versus remote access stats seems useful. We could add two generic counters: PERF_COUNT_HW_RAM_LOCAL = 7, PERF_COUNT_HW_RAM_REMOTE = 8, And map them properly on all CPUs that support such stats. They'd be accessible via '-e ram-local-refs' and '-e ram-remote-refs' type of event symbols. What is your typical usage pattern of this counter? What (general) kind of app do you profile with it and how do you make use of the specific node masks? Would a local/all-remote distinction be enough, or do you need to make a distinction between the individual nodes to get the best insight into the workload? > One strange thing I noticed: sometimes perf reports that there > were some accesses to target numa nodes 4-7 while my box only has > 4 numa nodes: If I request counters only for the non-existing > target numa nodes (4-7, with -e r1000010e0 -e r1000020e0 -e > r1000040e0 -e r1000080e0), I always get 4 zeros. > > But if I mix some couinters from the existing nodes (0-3) with > some counters from non-existing nodes (4-7), the non-existing ones > report some small but non-empty values. Does it ring any bell? I can see that too. I have a similar system (4 nodes), and if i use the stats for nodes 4-7 (non-existent) i get: phoenix:~> perf stat -e r1000010e0 -e r1000020e0 -e r1000040e0 -e r1000080e0 --repeat 10 ./hackbench 30 Time: 0.490 Time: 0.435 Time: 0.492 Time: 0.569 Time: 0.491 Time: 0.498 Time: 0.549 Time: 0.530 Time: 0.543 Time: 0.482 Performance counter stats for './hackbench 30' (10 runs): 0 raw 0x1000010e0 ( +- 0.000% ) 0 raw 0x1000020e0 ( +- 0.000% ) 0 raw 0x1000040e0 ( +- 0.000% ) 0 raw 0x1000080e0 ( +- 0.000% ) 0.610303953 seconds time elapsed. ( Note the --repeat option - that way you can repeat workloads and observe their statistical properties. ) If i try the first 4 nodes i get: phoenix:~> perf stat -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 --repeat 10 ./hackbench 30 Time: 0.403 Time: 0.431 Time: 0.406 Time: 0.421 Time: 0.461 Time: 0.423 Time: 0.495 Time: 0.462 Time: 0.434 Time: 0.459 Performance counter stats for './hackbench 30' (10 runs): 52255370 raw 0x1000001e0 ( +- 5.510% ) 46052950 raw 0x1000002e0 ( +- 8.067% ) 45966395 raw 0x1000004e0 ( +- 10.341% ) 63240044 raw 0x1000008e0 ( +- 11.707% ) 0.530894007 seconds time elapsed. Quite noisy across runs - which is expected on NUMA, as the memory allocations are not really deterministic and some more NUMA friendly than others. This box has all relevant NUMA options enabled: CONFIG_NUMA=y CONFIG_K8_NUMA=y CONFIG_X86_64_ACPI_NUMA=y CONFIG_ACPI_NUMA=y But if i 'mix' counters, i too get weird stats: phoenix:~> perf stat -e r1000020e0 -e r1000040e0 -e r1000080e0 -e r10000ffe0 --repeat 10 ./hackbench 30 Time: 0.432 Time: 0.446 Time: 0.428 Time: 0.472 Time: 0.443 Time: 0.454 Time: 0.398 Time: 0.438 Time: 0.403 Time: 0.463 Performance counter stats for './hackbench 30' (10 runs): 2355436 raw 0x1000020e0 ( +- 8.989% ) 0 raw 0x1000040e0 ( +- 0.000% ) 0 raw 0x1000080e0 ( +- 0.000% ) 204768941 raw 0x10000ffe0 ( +- 0.788% ) 0.528447241 seconds time elapsed. That 2355436 count for node 5 should have been zero. Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 14:36 ` Ingo Molnar @ 2009-06-23 15:22 ` Brice Goglin 2009-06-29 19:29 ` Ingo Molnar 0 siblings, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-06-23 15:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > btw., it might make sense to expose NUMA inbalance via generic > enumeration. Right now we have: > > PERF_COUNT_HW_CPU_CYCLES = 0, > PERF_COUNT_HW_INSTRUCTIONS = 1, > PERF_COUNT_HW_CACHE_REFERENCES = 2, > PERF_COUNT_HW_CACHE_MISSES = 3, > PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4, > PERF_COUNT_HW_BRANCH_MISSES = 5, > PERF_COUNT_HW_BUS_CYCLES = 6, > > plus we have cache stats: > > * Generalized hardware cache counters: > * > * { L1-D, L1-I, LLC, ITLB, DTLB, BPU } x > * { read, write, prefetch } x > * { accesses, misses } > By the way, is there a way to know which cache was actually used when we request cache references/misses? Always the largest/top one by default? > NUMA is here to stay, and expressing local versus remote access > stats seems useful. We could add two generic counters: > > PERF_COUNT_HW_RAM_LOCAL = 7, > PERF_COUNT_HW_RAM_REMOTE = 8, > > And map them properly on all CPUs that support such stats. They'd be > accessible via '-e ram-local-refs' and '-e ram-remote-refs' type of > event symbols. > > What is your typical usage pattern of this counter? What (general) > kind of app do you profile with it and how do you make use of the > specific node masks? > > Would a local/all-remote distinction be enough, or do you need to > make a distinction between the individual nodes to get the best > insight into the workload? > People here work on OpenMP runtime systems where you try to keep threads and data together. So in the end, what's important is to maximize the overall local/remote access ratio. But during development, it may useful to have a distinction between individual nodes so as to understand what's going on. That said, we still have raw numbers when we really need that many details, and I don't know if it'd be easy for you to add a generic counter with a sort of node-number attribute. (including part of your other email here since it's relevant) > How many threads does your workload typically run, and how do you > get their stats displayed? > In the aforementioned OpenMP stuff, we use pfmon to get the local/remote numa memory access ratio of each thread. In this specific case, we bind one thread per core (even with a O(1) scheduler, people tend to avoid launching hundreds of threads on current machines). pfmon gives us something similar to the output of 'perf stat' in a file whose filename contains process and thread IDs. We apply our own custom script to convert these many pfmon output files into a single summary saying for each thread, its thread ID, its core binding, its individual numa node access numbers and percentages, and if they were local or remote (with the Barcelona counters we were talking about, you need to check where you were running before you know if accesses to node X are actually local or remote accesses). thanks, Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 15:22 ` Brice Goglin @ 2009-06-29 19:29 ` Ingo Molnar 2009-08-06 16:59 ` Brice Goglin 0 siblings, 1 reply; 41+ messages in thread From: Ingo Molnar @ 2009-06-29 19:29 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > > How many threads does your workload typically run, and how do > > you get their stats displayed? > > In the aforementioned OpenMP stuff, we use pfmon to get the > local/remote numa memory access ratio of each thread. In this > specific case, we bind one thread per core (even with a O(1) > scheduler, people tend to avoid launching hundreds of threads on > current machines). pfmon gives us something similar to the output > of 'perf stat' in a file whose filename contains process and > thread IDs. We apply our own custom script to convert these many > pfmon output files into a single summary saying for each thread, > its thread ID, its core binding, its individual numa node access > numbers and percentages, and if they were local or remote (with > the Barcelona counters we were talking about, you need to check > where you were running before you know if accesses to node X are > actually local or remote accesses). Update: based on your feedback the latest perfcounters tree includes the following new perf record features: -s, --stat per thread counts -n, --no-samples don't sample --stat instructs the kernel to gather precise per task/thread stats and emits those counts to the data file. Via --no-samples one can do non-profiling runs - i.e. only statistics collection. The 'perf stat' pretty printing side is not fully implemented yet - right now you can only see these stats if you look for PERF_EVENT_READ counts in the raw event log: perf report -D | grep PERF_EVENT_READ But the biggest piece, the kernel and perf record side is there already. What kind of output would you prefer? Maybe you'd like to take a stab at implementing the perf report side? Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-29 19:29 ` Ingo Molnar @ 2009-08-06 16:59 ` Brice Goglin 2009-08-06 17:40 ` Peter Zijlstra 0 siblings, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-08-06 16:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > Update: based on your feedback the latest perfcounters tree includes > the following new perf record features: > > -s, --stat per thread counts > -n, --no-samples don't sample > > --stat instructs the kernel to gather precise per task/thread stats > and emits those counts to the data file. Via --no-samples one can do > non-profiling runs - i.e. only statistics collection. > > The 'perf stat' pretty printing side is not fully implemented yet - > right now you can only see these stats if you look for > PERF_EVENT_READ counts in the raw event log: > > perf report -D | grep PERF_EVENT_READ > Ok I am getting remote/local accesses for my threads now. But I am not sure which line corresponds to which event. $ /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf record -f -s -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 ./stream [...] $ /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf report -D | grep PERF_EVENT_READ 0x7cd0 [0x30]: PERF_EVENT_READ: 4651 4651 210827 0x7388 [0x30]: PERF_EVENT_READ: 4651 4651 241742 0x8cf0 [0x30]: PERF_EVENT_READ: 4651 4651 315938 0x9da0 [0x30]: PERF_EVENT_READ: 4651 4651 9461794 0x7208 [0x30]: PERF_EVENT_READ: 4651 4652 24954 0x8c90 [0x30]: PERF_EVENT_READ: 4651 4652 408056 0x7ca0 [0x30]: PERF_EVENT_READ: 4651 4652 8962423 0x9ce0 [0x30]: PERF_EVENT_READ: 4651 4652 9117 0x7df0 [0x30]: PERF_EVENT_READ: 4651 4653 21645 0x9d70 [0x30]: PERF_EVENT_READ: 4651 4653 23606 0x7358 [0x30]: PERF_EVENT_READ: 4651 4653 29266 0x8e70 [0x30]: PERF_EVENT_READ: 4651 4653 9339173 [...] I can easily sort them by thread id, but I don't know how to match my 4 events with each group of 4 line. Maybe perf report earned some better way to show per-thread statistics in the meantime? Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-08-06 16:59 ` Brice Goglin @ 2009-08-06 17:40 ` Peter Zijlstra 2009-08-06 17:48 ` Brice Goglin 2009-08-06 19:01 ` [perf] howto switch from pfmon Brice Goglin 0 siblings, 2 replies; 41+ messages in thread From: Peter Zijlstra @ 2009-08-06 17:40 UTC (permalink / raw) To: Brice Goglin; +Cc: Ingo Molnar, paulus, LKML On Thu, 2009-08-06 at 18:59 +0200, Brice Goglin wrote: > I can easily sort them by thread id, but I don't know how to match my 4 > events with each group of 4 line. > > Maybe perf report earned some better way to show per-thread statistics > in the meantime? Nah, it needs some love.. The below might be a starting point, it compiles, didn't check the result. builtin-stat might be a nice place to look for more bits.. --- diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 8cb58d6..c053fd8 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -112,7 +112,9 @@ struct read_event { struct perf_event_header header; u32 pid,tid; u64 value; - u64 format[3]; + u64 time_enabled; + u64 time_running; + u64 id; }; typedef union event_union { @@ -1690,14 +1692,37 @@ static void trace_event(event_t *event) dprintf(".\n"); } +static struct perf_header *header; + +static struct perf_counter_attr *perf_header__find_attr(u64 id) +{ + int i; + + for (i = 0; i < header->attrs; i++) { + struct perf_header_attr *attr = header->attr[i]; + int j; + + for (j = 0; j < attr->ids; j++) { + if (attr->id[j] == id) + return &attr->attr; + } + } + + return NULL; +} + static int process_read_event(event_t *event, unsigned long offset, unsigned long head) { - dprintf("%p [%p]: PERF_EVENT_READ: %d %d %Lu\n", + struct perf_counter_attr *attr = perf_header__find_attr(event->read.id); + + dprintf("%p [%p]: PERF_EVENT_READ: %d %d %s %Lu\n", (void *)(offset + head), (void *)(long)(event->header.size), event->read.pid, event->read.tid, + attr ? __event_name(attr->type, attr->config) + : "FAIL", event->read.value); return 0; @@ -1743,8 +1768,6 @@ process_event(event_t *event, unsigned long offset, unsigned long head) return 0; } -static struct perf_header *header; - static u64 perf_header__sample_type(void) { u64 sample_type = 0; diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 7bdad8d..4858d83 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -223,9 +239,15 @@ char *event_name(int counter) { u64 config = attrs[counter].config; int type = attrs[counter].type; + + return __event_name(type, config); +} + +char *__event_name(int type, u64 config) +{ static char buf[32]; - if (attrs[counter].type == PERF_TYPE_RAW) { + if (type == PERF_TYPE_RAW) { sprintf(buf, "raw 0x%llx", config); return buf; } diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index 1ea5d09..192a962 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -10,6 +10,7 @@ extern int nr_counters; extern struct perf_counter_attr attrs[MAX_COUNTERS]; extern char *event_name(int ctr); +extern char *__event_name(int type, u64 config); extern int parse_events(const struct option *opt, const char *str, int unset); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-08-06 17:40 ` Peter Zijlstra @ 2009-08-06 17:48 ` Brice Goglin 2009-08-06 17:59 ` Peter Zijlstra 2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra 2009-08-06 19:01 ` [perf] howto switch from pfmon Brice Goglin 1 sibling, 2 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-06 17:48 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, paulus, LKML Peter Zijlstra wrote: > On Thu, 2009-08-06 at 18:59 +0200, Brice Goglin wrote: > > >> I can easily sort them by thread id, but I don't know how to match my 4 >> events with each group of 4 line. >> >> Maybe perf report earned some better way to show per-thread statistics >> in the meantime? >> > > Nah, it needs some love.. > > The below might be a starting point, it compiles, didn't check the > result. builtin-stat might be a nice place to look for more bits.. > Thanks, now I get for each thread: 0x8fc0 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 209113 0x9698 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 307215 0x9cf8 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 9203221 0x8bb8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000001e0 494628 Looks like it fails to stringified my raw events except the first one. Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-08-06 17:48 ` Brice Goglin @ 2009-08-06 17:59 ` Peter Zijlstra 2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra 1 sibling, 0 replies; 41+ messages in thread From: Peter Zijlstra @ 2009-08-06 17:59 UTC (permalink / raw) To: Brice Goglin; +Cc: Ingo Molnar, paulus, LKML On Thu, 2009-08-06 at 19:48 +0200, Brice Goglin wrote: > Peter Zijlstra wrote: > > On Thu, 2009-08-06 at 18:59 +0200, Brice Goglin wrote: > > > > > >> I can easily sort them by thread id, but I don't know how to match my 4 > >> events with each group of 4 line. > >> > >> Maybe perf report earned some better way to show per-thread statistics > >> in the meantime? > >> > > > > Nah, it needs some love.. > > > > The below might be a starting point, it compiles, didn't check the > > result. builtin-stat might be a nice place to look for more bits.. > > > > Thanks, now I get for each thread: > > 0x8fc0 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 209113 > 0x9698 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 307215 > 0x9cf8 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 9203221 > 0x8bb8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000001e0 494628 > > Looks like it fails to stringified my raw events except the first one. /me mumbles intelligble I'll try and sort that out after dinner, unless you beat me to it :-) ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 17:48 ` Brice Goglin 2009-08-06 17:59 ` Peter Zijlstra @ 2009-08-06 18:57 ` Peter Zijlstra 2009-08-06 19:03 ` Brice Goglin ` (2 more replies) 1 sibling, 3 replies; 41+ messages in thread From: Peter Zijlstra @ 2009-08-06 18:57 UTC (permalink / raw) To: Brice Goglin; +Cc: Ingo Molnar, paulus, LKML On Thu, 2009-08-06 at 19:48 +0200, Brice Goglin wrote: > Peter Zijlstra wrote: > > On Thu, 2009-08-06 at 18:59 +0200, Brice Goglin wrote: > > > > > >> I can easily sort them by thread id, but I don't know how to match my 4 > >> events with each group of 4 line. > >> > >> Maybe perf report earned some better way to show per-thread statistics > >> in the meantime? > >> > > > > Nah, it needs some love.. > > > > The below might be a starting point, it compiles, didn't check the > > result. builtin-stat might be a nice place to look for more bits.. > > > > Thanks, now I get for each thread: > > 0x8fc0 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 209113 > 0x9698 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 307215 > 0x9cf8 [0x30]: PERF_EVENT_READ: 6268 6268 FAIL 9203221 > 0x8bb8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000001e0 494628 > > Looks like it fails to stringified my raw events except the first one. --- Subject: perf tools: Fix reading of perf.data file header A silly mistake made us re-read the first attribute for every attribute. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 450384b..95a44bc 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -213,9 +213,10 @@ struct perf_header *perf_header__read(int fd) for (i = 0; i < nr_attrs; i++) { struct perf_header_attr *attr; - off_t tmp = lseek(fd, 0, SEEK_CUR); + off_t tmp; do_read(fd, &f_attr, sizeof(f_attr)); + tmp = lseek(fd, 0, SEEK_CUR); attr = perf_header_attr__new(&f_attr.attr); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra @ 2009-08-06 19:03 ` Brice Goglin 2009-08-06 19:59 ` Ingo Molnar 2009-08-07 6:37 ` [tip:perfcounters/urgent] perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header tip-bot for Peter Zijlstra 2009-08-07 7:39 ` tip-bot for Peter Zijlstra 2 siblings, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-08-06 19:03 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, paulus, LKML $ /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf report -D | grep _READ | sort -k5 0x8bb8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000001e0 494628 0x8fc0 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000002e0 209113 0x9698 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000004e0 307215 0x9cf8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000008e0 9203221 0x8a08 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000001e0 9210788 0x9020 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000002e0 302344 0x9608 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000004e0 198705 0x9d28 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000008e0 473471 [...] Now I know which thread actually reads from where. Looks like we're good to go now! thanks a lot Peter! Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Brice Peter Zijlstra wrote: > Subject: perf tools: Fix reading of perf.data file header > > A silly mistake made us re-read the first attribute for every attribute. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c > index 450384b..95a44bc 100644 > --- a/tools/perf/util/header.c > +++ b/tools/perf/util/header.c > @@ -213,9 +213,10 @@ struct perf_header *perf_header__read(int fd) > > for (i = 0; i < nr_attrs; i++) { > struct perf_header_attr *attr; > - off_t tmp = lseek(fd, 0, SEEK_CUR); > + off_t tmp; > > do_read(fd, &f_attr, sizeof(f_attr)); > + tmp = lseek(fd, 0, SEEK_CUR); > > attr = perf_header_attr__new(&f_attr.attr); > > > > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 19:03 ` Brice Goglin @ 2009-08-06 19:59 ` Ingo Molnar 2009-08-06 20:03 ` Brice Goglin 2009-08-06 23:35 ` Brice Goglin 0 siblings, 2 replies; 41+ messages in thread From: Ingo Molnar @ 2009-08-06 19:59 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > $ /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf report -D | > grep _READ | sort -k5 > 0x8bb8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000001e0 494628 > 0x8fc0 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000002e0 209113 > 0x9698 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000004e0 307215 > 0x9cf8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000008e0 9203221 > 0x8a08 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000001e0 9210788 > 0x9020 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000002e0 302344 > 0x9608 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000004e0 198705 > 0x9d28 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000008e0 473471 > [...] > > Now I know which thread actually reads from where. > Looks like we're good to go now! thanks a lot Peter! > > Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Thanks Brice. It would be nice to add this as some "perf report -s/--stats" flag, to not have to go via -D (which is a 'print debug output' kind of ad-hoc thing and subject to format changes in the future). Would you be interested in sending a patch that adds that flag to 'perf report', to print out these statistics entries (if any), in a tabular form suitable for your purposes? Below is a past patch to builtin-report.c that shows how to add new options. Ingo --------------------> >From 429764873cf3fc3e73142872a674bb27cda589c1 Mon Sep 17 00:00:00 2001 From: Mike Galbraith <efault@gmx.de> Date: Thu, 2 Jul 2009 08:09:46 +0200 Subject: [PATCH] perf_counter tools: Enable kernel module symbol loading in tools Add the -m/--modules option to perf report and perf annotate, which enables live module symbol/image loading. To be used with -k/--vmlinux. (Also give perf annotate a -P/--full-paths option.) Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246514986.13293.48.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/builtin-annotate.c | 25 ++++++++++++++++++++----- tools/perf/builtin-report.c | 9 ++++++++- tools/perf/builtin-top.c | 12 ++++++++++-- 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 8820568..08ea6c5 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -43,6 +43,10 @@ static int dump_trace = 0; static int verbose; +static int modules; + +static int full_paths; + static int print_line; static unsigned long page_size; @@ -171,7 +175,7 @@ static int load_kernel(void) if (!kernel_dso) return -1; - err = dso__load_kernel(kernel_dso, vmlinux, NULL, verbose, 0); + err = dso__load_kernel(kernel_dso, vmlinux, NULL, verbose, modules); if (err <= 0) { dso__delete(kernel_dso); kernel_dso = NULL; @@ -1268,19 +1272,25 @@ static void print_summary(char *filename) static void annotate_sym(struct dso *dso, struct symbol *sym) { - char *filename = dso->name; + char *filename = dso->name, *d_filename; u64 start, end, len; char command[PATH_MAX*2]; FILE *file; if (!filename) return; - if (dso == kernel_dso) + if (sym->module) + filename = sym->module->path; + else if (dso == kernel_dso) filename = vmlinux; start = sym->obj_start; if (!start) start = sym->start; + if (full_paths) + d_filename = filename; + else + d_filename = basename(filename); end = start + sym->end - sym->start + 1; len = sym->end - sym->start; @@ -1291,13 +1301,14 @@ static void annotate_sym(struct dso *dso, struct symbol *sym) } printf("\n\n------------------------------------------------\n"); - printf(" Percent | Source code & Disassembly of %s\n", filename); + printf(" Percent | Source code & Disassembly of %s\n", d_filename); printf("------------------------------------------------\n"); if (verbose >= 2) printf("annotating [%p] %30s : [%p] %30s\n", dso, dso->name, sym, sym->name); - sprintf(command, "objdump --start-address=0x%016Lx --stop-address=0x%016Lx -dS %s", (u64)start, (u64)end, filename); + sprintf(command, "objdump --start-address=0x%016Lx --stop-address=0x%016Lx -dS %s|grep -v %s", + (u64)start, (u64)end, filename, filename); if (verbose >= 3) printf("doing: %s\n", command); @@ -1472,8 +1483,12 @@ static const struct option options[] = { OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace, "dump raw trace in ASCII"), OPT_STRING('k', "vmlinux", &vmlinux, "file", "vmlinux pathname"), + OPT_BOOLEAN('m', "modules", &modules, + "load module symbols - WARNING: use only with -k and LIVE kernel"), OPT_BOOLEAN('l', "print-line", &print_line, "print matching source lines (may be slow)"), + OPT_BOOLEAN('P', "full-paths", &full_paths, + "Don't shorten the displayed pathnames"), OPT_END() }; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 38d136f..b44476c 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -46,6 +46,8 @@ static int dump_trace = 0; static int verbose; #define eprintf(x...) do { if (verbose) fprintf(stderr, x); } while (0) +static int modules; + static int full_paths; static unsigned long page_size; @@ -188,7 +190,7 @@ static int load_kernel(void) if (!kernel_dso) return -1; - err = dso__load_kernel(kernel_dso, vmlinux, NULL, verbose, 0); + err = dso__load_kernel(kernel_dso, vmlinux, NULL, verbose, modules); if (err <= 0) { dso__delete(kernel_dso); kernel_dso = NULL; @@ -648,6 +650,9 @@ sort__sym_print(FILE *fp, struct hist_entry *self) ret += fprintf(fp, "[%c] %s", self->dso == kernel_dso ? 'k' : self->dso == hypervisor_dso ? 'h' : '.', self->sym->name); + + if (self->sym->module) + ret += fprintf(fp, "\t[%s]", self->sym->module->name); } else { ret += fprintf(fp, "%#016llx", (u64)self->ip); } @@ -1710,6 +1715,8 @@ static const struct option options[] = { OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace, "dump raw trace in ASCII"), OPT_STRING('k', "vmlinux", &vmlinux, "file", "vmlinux pathname"), + OPT_BOOLEAN('m', "modules", &modules, + "load module symbols - WARNING: use only with -k and LIVE kernel"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", "sort by key(s): pid, comm, dso, symbol, parent"), OPT_BOOLEAN('P', "full-paths", &full_paths, diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 9bb25fc..aa044ea 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -66,6 +66,7 @@ static unsigned int page_size; static unsigned int mmap_pages = 16; static int freq = 0; static int verbose = 0; +static char *vmlinux = NULL; static char *sym_filter; static unsigned long filter_start; @@ -265,7 +266,10 @@ static void print_sym_table(void) printf("%9.1f %10ld - ", syme->weight, syme->snap_count); color_fprintf(stdout, color, "%4.1f%%", pcnt); - printf(" - %016llx : %s\n", sym->start, sym->name); + printf(" - %016llx : %s", sym->start, sym->name); + if (sym->module) + printf("\t[%s]", sym->module->name); + printf("\n"); } } @@ -359,12 +363,13 @@ static int parse_symbols(void) { struct rb_node *node; struct symbol *sym; + int modules = vmlinux ? 1 : 0; kernel_dso = dso__new("[kernel]", sizeof(struct sym_entry)); if (kernel_dso == NULL) return -1; - if (dso__load_kernel(kernel_dso, NULL, symbol_filter, 1, 0) <= 0) + if (dso__load_kernel(kernel_dso, vmlinux, symbol_filter, verbose, modules) <= 0) goto out_delete_dso; node = rb_first(&kernel_dso->syms); @@ -680,6 +685,7 @@ static const struct option options[] = { "system-wide collection from all CPUs"), OPT_INTEGER('C', "CPU", &profile_cpu, "CPU to profile on"), + OPT_STRING('k', "vmlinux", &vmlinux, "file", "vmlinux pathname"), OPT_INTEGER('m', "mmap-pages", &mmap_pages, "number of mmap data pages"), OPT_INTEGER('r', "realtime", &realtime_prio, @@ -709,6 +715,8 @@ int cmd_top(int argc, const char **argv, const char *prefix __used) { int counter; + symbol__init(); + page_size = sysconf(_SC_PAGE_SIZE); argc = parse_options(argc, argv, options, top_usage, 0); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 19:59 ` Ingo Molnar @ 2009-08-06 20:03 ` Brice Goglin 2009-08-06 23:35 ` Brice Goglin 1 sibling, 0 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-06 20:03 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > * Brice Goglin <Brice.Goglin@inria.fr> wrote: > > >> $ /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf report -D | >> grep _READ | sort -k5 >> 0x8bb8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000001e0 494628 >> 0x8fc0 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000002e0 209113 >> 0x9698 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000004e0 307215 >> 0x9cf8 [0x30]: PERF_EVENT_READ: 6268 6268 raw 0x1000008e0 9203221 >> 0x8a08 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000001e0 9210788 >> 0x9020 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000002e0 302344 >> 0x9608 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000004e0 198705 >> 0x9d28 [0x30]: PERF_EVENT_READ: 6268 6269 raw 0x1000008e0 473471 >> [...] >> >> Now I know which thread actually reads from where. >> Looks like we're good to go now! thanks a lot Peter! >> >> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> >> > > Thanks Brice. > > It would be nice to add this as some "perf report -s/--stats" flag, > to not have to go via -D (which is a 'print debug output' kind of > ad-hoc thing and subject to format changes in the future). > > Would you be interested in sending a patch that adds that flag to > 'perf report', to print out these statistics entries (if any), in a > tabular form suitable for your purposes? Below is a past patch to > builtin-report.c that shows how to add new options. > > I'll see what I can do. I have been looking at the code tonight before Peter fixed the last problem but I didn't manage to understand much of the code so far. So thanks for the example patch :) Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 19:59 ` Ingo Molnar 2009-08-06 20:03 ` Brice Goglin @ 2009-08-06 23:35 ` Brice Goglin 2009-08-07 6:13 ` Brice Goglin 2009-08-07 6:32 ` Ingo Molnar 1 sibling, 2 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-06 23:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > It would be nice to add this as some "perf report -s/--stats" flag, > to not have to go via -D (which is a 'print debug output' kind of > ad-hoc thing and subject to format changes in the future). > > Would you be interested in sending a patch that adds that flag to > 'perf report', to print out these statistics entries (if any), in a > tabular form suitable for your purposes? Below is a past patch to > builtin-report.c that shows how to add new options. > Here's a quick'n'dirty first try. Read events are copied in the show_stat_event array during process_read_event. And __cmd_report sorts the array by tid before displaying it. perf report -S now shows the following after the existing output: (-s is already used for something else). It shows things like # Per-thread statistics: # PID TID Event Count 16709 16709 cache-misses 82727 16709 16709 cache-references 41238768 16709 16710 cache-misses 6462 16709 16710 cache-references 76119375 or # Per-thread statistics: # PID TID Event Count 6268 6268 raw 0x1000001e0 494628 6268 6268 raw 0x1000002e0 209113 6268 6268 raw 0x1000004e0 307215 6268 6268 raw 0x1000008e0 9203221 6268 6269 raw 0x1000001e0 9210788 6268 6269 raw 0x1000002e0 302344 6268 6269 raw 0x1000004e0 198705 6268 6269 raw 0x1000008e0 473471 Obviously, there's some a lot of nice pretty printing to do, but you'll be able to tell whether the general idea is ok or not. Brice Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Index: linux-2.6.31/tools/perf/builtin-report.c =================================================================== --- linux-2.6.31.orig/tools/perf/builtin-report.c 2009-08-07 00:42:55.000000000 +0200 +++ linux-2.6.31/tools/perf/builtin-report.c 2009-08-07 01:26:33.000000000 +0200 @@ -24,6 +24,8 @@ #include "util/parse-options.h" #include "util/parse-events.h" +#include <stdlib.h> + #define SHOW_KERNEL 1 #define SHOW_USER 2 #define SHOW_HV 4 @@ -52,6 +54,10 @@ static int full_paths; static int show_nr_samples; +static int show_stat; +static int show_stat_events; +static int show_stat_event_max; +static struct read_event *show_stat_event; static unsigned long page_size; static unsigned long mmap_window = 32; @@ -126,6 +132,8 @@ struct read_event read; } event_t; +static struct perf_counter_attr *perf_header__find_attr(u64 id); + static int repsep_fprintf(FILE *fp, const char *fmt, ...) { int n; @@ -1350,6 +1358,13 @@ } } +static int compar_read_event_by_tid(const void *e1, const void *e2) +{ + const struct read_event *event1 = e1; + const struct read_event *event2 = e2; + return event1->tid - event2->tid; +} + static size_t output__fprintf(FILE *fp, u64 total_samples) { struct hist_entry *pos; @@ -1430,6 +1445,21 @@ } fprintf(fp, "\n"); + if (show_stat && show_stat_events) { + int i; + qsort(&show_stat_event[0], show_stat_events, sizeof(struct read_event), compar_read_event_by_tid); + fprintf(fp, "# Per-thread statistics:\n"); + fprintf(fp, "# PID TID Event Count\n"); + for(i=0; i<show_stat_events; i++) { + struct read_event *event = &show_stat_event[i]; + struct perf_counter_attr *attr = perf_header__find_attr(event->id); + printf(" %d %d %s %Lu\n", + event->pid, event->tid, + attr ? __event_name(attr->type, attr->config) : "unknown", + event->value); + } + } + return ret; } @@ -1703,6 +1733,24 @@ { struct perf_counter_attr *attr = perf_header__find_attr(event->read.id); + if (show_stat) { + if (!show_stat_event) { + show_stat_events = 0; + show_stat_event_max = 16; + show_stat_event = malloc(show_stat_event_max * sizeof(*show_stat_event)); + if (!show_stat_event) + die("cannot allocate show_stat_event array"); + } + if (show_stat_events == show_stat_event_max) { + show_stat_event_max *= 2; + show_stat_event = realloc(show_stat_event, show_stat_event_max * sizeof(*show_stat_event)); + if (!show_stat_event) + die("cannot enlarge show_stat_event array"); + } + memcpy(&show_stat_event[show_stat_events], &event->read, sizeof(struct read_event)); + show_stat_events++; + } + dprintf("%p [%p]: PERF_EVENT_READ: %d %d %s %Lu\n", (void *)(offset + head), (void *)(long)(event->header.size), @@ -1998,6 +2046,8 @@ "Show a column with the number of samples"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", "sort by key(s): pid, comm, dso, symbol, parent"), + OPT_BOOLEAN('S', "stat", &show_stat, + "show per-thread event counters"), OPT_BOOLEAN('P', "full-paths", &full_paths, "Don't shorten the pathnames taking into account the cwd"), OPT_STRING('p', "parent", &parent_pattern, "regex", ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 23:35 ` Brice Goglin @ 2009-08-07 6:13 ` Brice Goglin 2009-08-07 6:32 ` Ingo Molnar 1 sibling, 0 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-07 6:13 UTC (permalink / raw) Cc: Ingo Molnar, Peter Zijlstra, paulus, LKML Brice Goglin wrote: > Ingo Molnar wrote: > >> It would be nice to add this as some "perf report -s/--stats" flag, >> to not have to go via -D (which is a 'print debug output' kind of >> ad-hoc thing and subject to format changes in the future). >> >> Would you be interested in sending a patch that adds that flag to >> 'perf report', to print out these statistics entries (if any), in a >> tabular form suitable for your purposes? Below is a past patch to >> builtin-report.c that shows how to add new options. >> >> > > Here's a quick'n'dirty first try. Read events are copied in the > show_stat_event array during process_read_event. And __cmd_report sorts > the array by tid before displaying it. > > perf report -S now shows the following after the existing output: (-s is > already used for something else). > It shows things like > # Per-thread statistics: > # PID TID Event Count > 16709 16709 cache-misses 82727 > 16709 16709 cache-references 41238768 > 16709 16710 cache-misses 6462 > 16709 16710 cache-references 76119375 > or > # Per-thread statistics: > # PID TID Event Count > 6268 6268 raw 0x1000001e0 494628 > 6268 6268 raw 0x1000002e0 209113 > 6268 6268 raw 0x1000004e0 307215 > 6268 6268 raw 0x1000008e0 9203221 > 6268 6269 raw 0x1000001e0 9210788 > 6268 6269 raw 0x1000002e0 302344 > 6268 6269 raw 0x1000004e0 198705 > 6268 6269 raw 0x1000008e0 473471 > > Obviously, there's some a lot of nice pretty printing to do, but you'll > be able to tell whether the general idea is ok or not. > Is there an easy way to know how many threads and how many different event types were recorded? Once I know that, I could directly gather values into a 2D matrix and display a single line per thread with all corresponding event counter (so that people can easily apply awk to manipulate them). I am adding some pretty printing to the previous patch and waiting for your feedback (for instance, what kind of global variable names, command line option names, ... do you want or so). Brice PS: Is there a perf counters mailing list? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-06 23:35 ` Brice Goglin 2009-08-07 6:13 ` Brice Goglin @ 2009-08-07 6:32 ` Ingo Molnar 2009-08-07 7:38 ` Brice Goglin 2009-08-07 11:55 ` [PATCH] perf report: Display per-thread event counters Brice Goglin 1 sibling, 2 replies; 41+ messages in thread From: Ingo Molnar @ 2009-08-07 6:32 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Ingo Molnar wrote: > > It would be nice to add this as some "perf report -s/--stats" flag, > > to not have to go via -D (which is a 'print debug output' kind of > > ad-hoc thing and subject to format changes in the future). > > > > Would you be interested in sending a patch that adds that flag > > to 'perf report', to print out these statistics entries (if > > any), in a tabular form suitable for your purposes? Below is a > > past patch to builtin-report.c that shows how to add new > > options. > > Here's a quick'n'dirty first try. Read events are copied in the > show_stat_event array during process_read_event. And __cmd_report > sorts the array by tid before displaying it. > > perf report -S now shows the following after the existing output: (-s is > already used for something else). > It shows things like > # Per-thread statistics: > # PID TID Event Count > 16709 16709 cache-misses 82727 > 16709 16709 cache-references 41238768 > 16709 16710 cache-misses 6462 > 16709 16710 cache-references 76119375 > or > # Per-thread statistics: > # PID TID Event Count > 6268 6268 raw 0x1000001e0 494628 > 6268 6268 raw 0x1000002e0 209113 > 6268 6268 raw 0x1000004e0 307215 > 6268 6268 raw 0x1000008e0 9203221 > 6268 6269 raw 0x1000001e0 9210788 > 6268 6269 raw 0x1000002e0 302344 > 6268 6269 raw 0x1000004e0 198705 > 6268 6269 raw 0x1000008e0 473471 > > Obviously, there's some a lot of nice pretty printing to do, but > you'll be able to tell whether the general idea is ok or not. Yeah, the general idea looks OK to me. I think it would be nice to share the pretty-printing code with 'perf stat' (builtin-stat.c). Plus, to make it easier for your scripting needs, we could add a --pretty=raw type of flag to both perf stat and perf report, which would emit the data in a raw way - to be used in gnuplot almost straight away, etc. But for the typical interactive use it would be nice to do the 2D tabular form that 'perf stat' does. (btw: feel free to enhance that output as well, where it seems appropriate) here's a few mostly stylistic comments: > static int full_paths; > static int show_nr_samples; > +static int show_stat; > +static int show_stat_events; > +static int show_stat_event_max; > +static struct read_event *show_stat_event; > @@ -126,6 +132,8 @@ > struct read_event read; > } event_t; > > +static struct perf_counter_attr *perf_header__find_attr(u64 id); Small cleanliness detail: could this function be moved to this spot? That way we could avoid this prototype declaration. > +static int compar_read_event_by_tid(const void *e1, const void *e2) > +{ > + const struct read_event *event1 = e1; > + const struct read_event *event2 = e2; > + return event1->tid - event2->tid; > +} another small detail: i'd suggest s/compar/compare, and please put a newline after local variables, i.e. something like: static int compare_read_event_by_tid(const void *e1, const void *e2) { const struct read_event *event1 = e1; const struct read_event *event2 = e2; return event1->tid - event2->tid; } > @@ -1430,6 +1445,21 @@ > } > fprintf(fp, "\n"); > > + if (show_stat && show_stat_events) { > + int i; [please add a newline here too] > + qsort(&show_stat_event[0], show_stat_events, sizeof(struct read_event), compar_read_event_by_tid); > + fprintf(fp, "# Per-thread statistics:\n"); > + fprintf(fp, "# PID TID Event Count\n"); > + for(i=0; i<show_stat_events; i++) { We write loops a tiny bit differently in the kernel - there's an easy way to check such details: run your patch through scripts/checkpatch.pl. > + struct read_event *event = &show_stat_event[i]; > + struct perf_counter_attr *attr = perf_header__find_attr(event->id); [please add a newline here too] > + printf(" %d %d %s %Lu\n", > + event->pid, event->tid, > + attr ? __event_name(attr->type, attr->config) : "unknown", > + event->value); > + } > + } i'd also suggest to put this function into a helper inline function, which can start with: if (!show_stat || !show_stat_events) return; That way it looks a (tiny) bit more structured and we win an indentation level. > + if (show_stat) { > + if (!show_stat_event) { > + show_stat_events = 0; > + show_stat_event_max = 16; > + show_stat_event = malloc(show_stat_event_max * sizeof(*show_stat_event)); > + if (!show_stat_event) > + die("cannot allocate show_stat_event array"); > + } > + if (show_stat_events == show_stat_event_max) { > + show_stat_event_max *= 2; > + show_stat_event = realloc(show_stat_event, show_stat_event_max * sizeof(*show_stat_event)); > + if (!show_stat_event) > + die("cannot enlarge show_stat_event array"); > + } > + memcpy(&show_stat_event[show_stat_events], &event->read, sizeof(struct read_event)); > + show_stat_events++; > + } this too could move into a helper inline function. > @@ -1998,6 +2046,8 @@ > "Show a column with the number of samples"), > OPT_STRING('s', "sort", &sort_order, "key[,key2...]", > "sort by key(s): pid, comm, dso, symbol, parent"), > + OPT_BOOLEAN('S', "stat", &show_stat, > + "show per-thread event counters"), Ok, there's indeed a flag clash with -s/--sort as you noticed. -S looks good to me, how about going one step further and changing perf record to use -S/--stat as well, to make the flag consistent across all tools? In any case, this patch moves into the right direction - this kind of functionality is exactly what we need. Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-07 6:32 ` Ingo Molnar @ 2009-08-07 7:38 ` Brice Goglin 2009-08-07 7:45 ` Ingo Molnar 2009-08-07 11:55 ` [PATCH] perf report: Display per-thread event counters Brice Goglin 1 sibling, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-08-07 7:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > Ok, there's indeed a flag clash with -s/--sort as you noticed. > > -S looks good to me, how about going one step further and changing > perf record to use -S/--stat as well, to make the flag consistent > across all tools? > Sure, but perf stat already uses -S for --scale :) Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-07 7:38 ` Brice Goglin @ 2009-08-07 7:45 ` Ingo Molnar 2009-08-07 8:18 ` Brice Goglin 0 siblings, 1 reply; 41+ messages in thread From: Ingo Molnar @ 2009-08-07 7:45 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Ingo Molnar wrote: > > Ok, there's indeed a flag clash with -s/--sort as you noticed. > > > > -S looks good to me, how about going one step further and changing > > perf record to use -S/--stat as well, to make the flag consistent > > across all tools? > > Sure, but perf stat already uses -S for --scale :) heh :-) I completely forgot about it. And since we now have --scale enabled by default (and it doesnt make much sense to disable it i guess), it would not be a big issue to rename that -c/--scale to free up that flag for -S/--stat? Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-07 7:45 ` Ingo Molnar @ 2009-08-07 8:18 ` Brice Goglin 2009-08-07 8:23 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-07 8:18 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > I completely forgot about it. And since we now have --scale enabled > by default (and it doesnt make much sense to disable it i guess), it > would not be a big issue to rename that -c/--scale to free up that > flag for -S/--stat? > > Ingo > Here you are. But how do you switch "scale" off now that it's disabled by default? I tried --scale=0, --no-scale and so on without apparently succeeding. Brice [PATCH] perf stat: change -S for --scale into -c Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Index: linux-rc/tools/perf/Documentation/perf-stat.txt =================================================================== --- linux-rc.orig/tools/perf/Documentation/perf-stat.txt 2009-08-07 10:09:40.000000000 +0200 +++ linux-rc/tools/perf/Documentation/perf-stat.txt 2009-08-07 10:09:46.000000000 +0200 @@ -40,7 +40,7 @@ -a:: system-wide collection --S:: +-c:: scale counter values EXAMPLES Index: linux-rc/tools/perf/builtin-stat.c =================================================================== --- linux-rc.orig/tools/perf/builtin-stat.c 2009-08-07 09:46:49.000000000 +0200 +++ linux-rc/tools/perf/builtin-stat.c 2009-08-07 10:07:44.000000000 +0200 @@ -496,7 +496,7 @@ "stat events on existing pid"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), - OPT_BOOLEAN('S', "scale", &scale, + OPT_BOOLEAN('c', "scale", &scale, "scale/normalize counters"), OPT_BOOLEAN('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"), ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-07 8:18 ` Brice Goglin @ 2009-08-07 8:23 ` Ingo Molnar 2009-08-07 8:27 ` Ingo Molnar 2009-08-07 8:30 ` [tip:perfcounters/core] perf stat: Rename -S/--scale to -c/--scale tip-bot for Brice Goglin 2 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2009-08-07 8:23 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Ingo Molnar wrote: > > I completely forgot about it. And since we now have --scale enabled > > by default (and it doesnt make much sense to disable it i guess), it > > would not be a big issue to rename that -c/--scale to free up that > > flag for -S/--stat? > > > > Ingo > > > > > Here you are. But how do you switch "scale" off now that it's > disabled by default? I tried --scale=0, --no-scale and so on > without apparently succeeding. --no-scale seems to work here: aldebaran:~> perf stat --no-scale -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 ~/loop_1b_instructions Performance counter stats for '/home/mingo/loop_1b_instructions': 469128160 cycles 469072870 cycles 470517049 cycles 473750835 cycles 476987439 cycles 477364955 cycles 474188871 cycles 470950809 cycles 0.237171581 seconds time elapsed aldebaran:~> perf stat --scale -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 -e 0:0 ~/loop_1b_instructions Performance counter stats for '/home/mingo/loop_1b_instructions': 756242908 cycles (scaled from 62.00%) 756238670 cycles (scaled from 62.00%) 756237417 cycles (scaled from 62.23%) 756093940 cycles (scaled from 62.66%) 756097133 cycles (scaled from 63.08%) 756141523 cycles (scaled from 63.10%) 756142674 cycles (scaled from 62.68%) 756154220 cycles (scaled from 62.26%) 0.236972028 seconds time elapsed Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf tools: Fix reading of perf.data file header 2009-08-07 8:18 ` Brice Goglin 2009-08-07 8:23 ` Ingo Molnar @ 2009-08-07 8:27 ` Ingo Molnar 2009-08-07 8:30 ` [tip:perfcounters/core] perf stat: Rename -S/--scale to -c/--scale tip-bot for Brice Goglin 2 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2009-08-07 8:27 UTC (permalink / raw) To: Brice Goglin; +Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > [PATCH] perf stat: change -S for --scale into -c Applied, thanks Brice! Find below the full changelog - i filled in the 'why' bits. Ingo -------------------------> >From d4b2db0a624500c978f14393e6933d5d6aa552d2 Mon Sep 17 00:00:00 2001 From: Brice Goglin <Brice.Goglin@inria.fr> Date: Fri, 7 Aug 2009 10:18:39 +0200 Subject: [PATCH] perf stat: Rename -S/--scale to -c/--scale We want to use a coherent flag for -S/--stat across all tools, so free up -S in perf stat. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org LKML-Reference: <4A7BE35F.4060604@inria.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/Documentation/perf-stat.txt | 2 +- tools/perf/builtin-stat.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 0d74346..484080d 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -40,7 +40,7 @@ OPTIONS -a:: system-wide collection --S:: +-c:: scale counter values EXAMPLES diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index f9510ee..b4b06c7 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -496,7 +496,7 @@ static const struct option options[] = { "stat events on existing pid"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), - OPT_BOOLEAN('S', "scale", &scale, + OPT_BOOLEAN('c', "scale", &scale, "scale/normalize counters"), OPT_BOOLEAN('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"), ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [tip:perfcounters/core] perf stat: Rename -S/--scale to -c/--scale 2009-08-07 8:18 ` Brice Goglin 2009-08-07 8:23 ` Ingo Molnar 2009-08-07 8:27 ` Ingo Molnar @ 2009-08-07 8:30 ` tip-bot for Brice Goglin 2 siblings, 0 replies; 41+ messages in thread From: tip-bot for Brice Goglin @ 2009-08-07 8:30 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Brice.Goglin, hpa, mingo, a.p.zijlstra, tglx, mingo Commit-ID: d4b2db0a624500c978f14393e6933d5d6aa552d2 Gitweb: http://git.kernel.org/tip/d4b2db0a624500c978f14393e6933d5d6aa552d2 Author: Brice Goglin <Brice.Goglin@inria.fr> AuthorDate: Fri, 7 Aug 2009 10:18:39 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 7 Aug 2009 10:26:12 +0200 perf stat: Rename -S/--scale to -c/--scale We want to use a coherent flag for -S/--stat across all tools, so free up -S in perf stat. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org LKML-Reference: <4A7BE35F.4060604@inria.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/Documentation/perf-stat.txt | 2 +- tools/perf/builtin-stat.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 0d74346..484080d 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -40,7 +40,7 @@ OPTIONS -a:: system-wide collection --S:: +-c:: scale counter values EXAMPLES diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index f9510ee..b4b06c7 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -496,7 +496,7 @@ static const struct option options[] = { "stat events on existing pid"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), - OPT_BOOLEAN('S', "scale", &scale, + OPT_BOOLEAN('c', "scale", &scale, "scale/normalize counters"), OPT_BOOLEAN('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"), ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH] perf report: Display per-thread event counters 2009-08-07 6:32 ` Ingo Molnar 2009-08-07 7:38 ` Brice Goglin @ 2009-08-07 11:55 ` Brice Goglin 2009-08-08 11:54 ` [tip:perfcounters/core] perf report: Fix and improve the displaying of " tip-bot for Brice Goglin 2009-08-08 12:14 ` [PATCH] perf report: Display " Ingo Molnar 1 sibling, 2 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-07 11:55 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, paulus, LKML Here's a better patch. I moved everything to utils/values.[ch] so that we may reuse it in perf stat. But I don't see yet where I am suppose to get something like PERF_READ_EVENT in builtin-stat.c so I haven't touched it yet. We get something like this now: # PID TID cache-misses cache-references 4658 4659 495581 3238779 4658 4662 498246 3236823 4658 4663 499531 3243162 Then it'll be easy to add --pretty=raw to display a single line per thread/event. By the way, -S was also used for --symbol... So I used -T/--thread here. perf report: Add -T/--threads to display per-thread counter values We get something like this now: # PID TID cache-misses cache-references 4658 4659 495581 3238779 4658 4662 498246 3236823 4658 4663 499531 3243162 Per-thread arrays of counter values are managed in utils/values.[ch] Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index e72e931..370344a 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -27,6 +27,9 @@ OPTIONS -n --show-nr-samples Show the number of samples for each symbol +-T +--threads + Show per-thread event counters -C:: --comms=:: Only consider symbols in these comms. CSV that understands diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 1916e44..9fc133f 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -310,6 +310,7 @@ LIB_H += util/sigchain.h LIB_H += util/symbol.h LIB_H += util/module.h LIB_H += util/color.h +LIB_H += util/values.h LIB_OBJS += util/abspath.o LIB_OBJS += util/alias.o @@ -337,6 +338,7 @@ LIB_OBJS += util/color.o LIB_OBJS += util/pager.o LIB_OBJS += util/header.o LIB_OBJS += util/callchain.o +LIB_OBJS += util/values.o BUILTIN_OBJS += builtin-annotate.o BUILTIN_OBJS += builtin-help.o diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index a5e2f8d..f7be85e 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -17,6 +17,7 @@ #include "util/string.h" #include "util/callchain.h" #include "util/strlist.h" +#include "util/values.h" #include "perf.h" #include "util/header.h" @@ -53,6 +54,9 @@ static int modules; static int full_paths; static int show_nr_samples; +static int show_threads; +static struct perf_read_values show_threads_values; + static unsigned long page_size; static unsigned long mmap_window = 32; @@ -1432,6 +1436,9 @@ print_entries: } fprintf(fp, "\n"); + if (show_threads) + perf_read_values_display(fp, &show_threads_values); + return ret; } @@ -1717,6 +1724,16 @@ process_read_event(event_t *event, unsigned long offset, unsigned long head) { struct perf_counter_attr *attr = perf_header__find_attr(event->read.id); + if (show_threads) { + char *name = attr ? __event_name(attr->type, attr->config) + : "unknown"; + perf_read_values_add_value(&show_threads_values, + event->read.pid, event->read.tid, + event->read.id, + name, + event->read.value); + } + dprintf("%p [%p]: PERF_EVENT_READ: %d %d %s %Lu\n", (void *)(offset + head), (void *)(long)(event->header.size), @@ -1798,6 +1815,9 @@ static int __cmd_report(void) register_idle_thread(); + if (show_threads) + perf_read_values_init(&show_threads_values); + input = open(input_name, O_RDONLY); if (input < 0) { fprintf(stderr, " failed to open file: %s", input_name); @@ -1945,6 +1965,9 @@ done: output__resort(total); output__fprintf(stdout, total); + if (show_threads) + perf_read_values_destroy(&show_threads_values); + return rc; } @@ -2011,6 +2034,8 @@ static const struct option options[] = { "load module symbols - WARNING: use only with -k and LIVE kernel"), OPT_BOOLEAN('n', "show-nr-samples", &show_nr_samples, "Show a column with the number of samples"), + OPT_BOOLEAN('T', "threads", &show_threads, + "Show per-thread event counters"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", "sort by key(s): pid, comm, dso, symbol, parent"), OPT_BOOLEAN('P', "full-paths", &full_paths, diff --git a/tools/perf/util/values.c b/tools/perf/util/values.c new file mode 100644 index 0000000..8551c0b --- /dev/null +++ b/tools/perf/util/values.c @@ -0,0 +1,171 @@ +#include <stdlib.h> + +#include "util.h" +#include "values.h" + +void perf_read_values_init(struct perf_read_values *values) +{ + values->threads_max = 16; + values->pid = malloc(values->threads_max * sizeof(*values->pid)); + values->tid = malloc(values->threads_max * sizeof(*values->tid)); + values->value = malloc(values->threads_max * sizeof(*values->value)); + if (!values->pid || !values->tid || !values->value) + die("failed to allocate read_values threads arrays"); + values->threads = 0; + + values->counters_max = 16; + values->counterrawid = malloc(values->counters_max + * sizeof(*values->counterrawid)); + values->countername = malloc(values->counters_max + * sizeof(*values->countername)); + if (!values->counterrawid || !values->countername) + die("failed to allocate read_values counters arrays"); + values->counters = 0; +} + +void perf_read_values_destroy(struct perf_read_values *values) +{ + int i; + + if (!values->threads_max || !values->counters_max) + return; + + for (i = 0; i < values->threads; i++) + free(values->value[i]); + free(values->pid); + free(values->tid); + free(values->counterrawid); + for (i = 0; i < values->counters; i++) + free(values->countername[i]); + free(values->countername); +} + +static void perf_read_values__enlarge_threads(struct perf_read_values *values) +{ + values->threads_max *= 2; + values->pid = realloc(values->pid, + values->threads_max * sizeof(*values->pid)); + values->tid = realloc(values->tid, + values->threads_max * sizeof(*values->tid)); + values->value = realloc(values->value, + values->threads_max * sizeof(*values->value)); + if (!values->pid || !values->tid || !values->value) + die("failed to enlarge read_values threads arrays"); +} + +static int perf_read_values__findnew_thread(struct perf_read_values *values, + u32 pid, u32 tid) +{ + int i; + + for (i = 0; i < values->threads; i++) + if (values->pid[i] == pid && values->tid[i] == tid) + return i; + + if (values->threads == values->threads_max) + perf_read_values__enlarge_threads(values); + + i = values->threads++; + values->pid[i] = pid; + values->tid[i] = tid; + values->value[i] = malloc(values->counters_max * sizeof(**values->value)); + if (!values->value[i]) + die("failed to allocate read_values counters array"); + + return i; +} + +static void perf_read_values__enlarge_counters(struct perf_read_values *values) +{ + int i; + + values->counters_max *= 2; + values->counterrawid = realloc(values->counterrawid, + values->counters_max * sizeof(*values->counterrawid)); + values->countername = realloc(values->countername, + values->counters_max * sizeof(*values->countername)); + if (!values->counterrawid || !values->countername) + die("failed to enlarge read_values counters arrays"); + + for (i = 0; i < values->threads; i++) { + values->value[i] = realloc(values->value[i], + values->counters_max * sizeof(**values->value)); + if (!values->value[i]) + die("failed to enlarge read_values counters arrays"); + } +} + +static int perf_read_values__findnew_counter(struct perf_read_values *values, + u64 rawid, char *name) +{ + int i; + + for (i = 0; i < values->counters; i++) + if (values->counterrawid[i] == rawid) + return i; + + if (values->counters == values->counters_max) + perf_read_values__enlarge_counters(values); + + i = values->counters++; + values->counterrawid[i] = rawid; + values->countername[i] = strdup(name); + + return i; +} + +void perf_read_values_add_value(struct perf_read_values *values, + u32 pid, u32 tid, + u64 rawid, char *name, u64 value) +{ + int tindex, cindex; + + tindex = perf_read_values__findnew_thread(values, pid, tid); + cindex = perf_read_values__findnew_counter(values, rawid, name); + + values->value[tindex][cindex] = value; +} + +void perf_read_values_display(FILE *fp, struct perf_read_values *values) +{ + int i, j; + int pidwidth, tidwidth; + int *counterwidth; + + counterwidth = malloc(values->counters * sizeof(*counterwidth)); + if (!counterwidth) + die("failed to allocate counterwidth array"); + tidwidth = 3; + pidwidth = 3; + for (j = 0; j < values->counters; j++) + counterwidth[j] = strlen(values->countername[j]); + for (i = 0; i < values->threads; i++) { + int width; + + width = snprintf(NULL, 0, "%d", values->pid[i]); + if (width > pidwidth) + pidwidth = width; + width = snprintf(NULL, 0, "%d", values->tid[i]); + if (width > tidwidth) + tidwidth = width; + for (j = 0; j < values->counters; j++) { + width = snprintf(NULL, 0, "%Lu", values->value[i][j]); + if (width > counterwidth[j]) + counterwidth[j] = width; + } + } + + fprintf(fp, "# %*s %*s", pidwidth, "PID", tidwidth, "TID"); + for (j = 0; j < values->counters; j++) + fprintf(fp, " %*s", counterwidth[j], values->countername[j]); + fprintf(fp, "\n"); + + for (i = 0; i < values->threads; i++) { + fprintf(fp, " %*d %*d", pidwidth, values->pid[i], + tidwidth, values->tid[i]); + for (j = 0; j < values->counters; j++) + fprintf(fp, " %*Lu", + counterwidth[j], values->value[i][j]); + fprintf(fp, "\n"); + } +} diff --git a/tools/perf/util/values.h b/tools/perf/util/values.h new file mode 100644 index 0000000..e41be5e --- /dev/null +++ b/tools/perf/util/values.h @@ -0,0 +1,26 @@ +#ifndef _PERF_VALUES_H +#define _PERF_VALUES_H + +#include "types.h" + +struct perf_read_values { + int threads; + int threads_max; + u32 *pid, *tid; + int counters; + int counters_max; + u64 *counterrawid; + char **countername; + u64 **value; +}; + +void perf_read_values_init(struct perf_read_values *values); +void perf_read_values_destroy(struct perf_read_values *values); + +void perf_read_values_add_value(struct perf_read_values *values, + u32 pid, u32 tid, + u64 rawid, char *name, u64 value); + +void perf_read_values_display(FILE *fp, struct perf_read_values *values); + +#endif /* _PERF_VALUES_H */ ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [tip:perfcounters/core] perf report: Fix and improve the displaying of per-thread event counters 2009-08-07 11:55 ` [PATCH] perf report: Display per-thread event counters Brice Goglin @ 2009-08-08 11:54 ` tip-bot for Brice Goglin 2009-08-08 12:14 ` [PATCH] perf report: Display " Ingo Molnar 1 sibling, 0 replies; 41+ messages in thread From: tip-bot for Brice Goglin @ 2009-08-08 11:54 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Brice.Goglin, hpa, mingo, a.p.zijlstra, tglx, mingo Commit-ID: 1c0cbc4e4a707afea1fd88f5f297f094c8adff82 Gitweb: http://git.kernel.org/tip/1c0cbc4e4a707afea1fd88f5f297f094c8adff82 Author: Brice Goglin <Brice.Goglin@inria.fr> AuthorDate: Fri, 7 Aug 2009 13:55:24 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Sat, 8 Aug 2009 13:53:13 +0200 perf report: Fix and improve the displaying of per-thread event counters Improve and fix the handling of per-thread counter stats recorded via perf record -s. Previously we only displayed it in debug printouts (-D) and even that output was hard to disambiguate. I moved everything to utils/values.[ch] so that we may reuse it in perf stat. We get something like this now: # PID TID cache-misses cache-references 4658 4659 495581 3238779 4658 4662 498246 3236823 4658 4663 499531 3243162 Then it'll be easy to add --pretty=raw to display a single line per thread/event. By the way, -S was also used for --symbol... So I used -T/--thread here. perf report: Add -T/--threads to display per-thread counter values We get something like this now: # PID TID cache-misses cache-references 4658 4659 495581 3238779 4658 4662 498246 3236823 4658 4663 499531 3243162 Per-thread arrays of counter values are managed in utils/values.[ch] Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org LKML-Reference: <4A7C162C.1030707@inria.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/Documentation/perf-report.txt | 3 + tools/perf/Makefile | 2 + tools/perf/builtin-report.c | 25 +++++ tools/perf/util/values.c | 171 ++++++++++++++++++++++++++++++ tools/perf/util/values.h | 26 +++++ 5 files changed, 227 insertions(+), 0 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index e72e931..370344a 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -27,6 +27,9 @@ OPTIONS -n --show-nr-samples Show the number of samples for each symbol +-T +--threads + Show per-thread event counters -C:: --comms=:: Only consider symbols in these comms. CSV that understands diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 1916e44..9fc133f 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -310,6 +310,7 @@ LIB_H += util/sigchain.h LIB_H += util/symbol.h LIB_H += util/module.h LIB_H += util/color.h +LIB_H += util/values.h LIB_OBJS += util/abspath.o LIB_OBJS += util/alias.o @@ -337,6 +338,7 @@ LIB_OBJS += util/color.o LIB_OBJS += util/pager.o LIB_OBJS += util/header.o LIB_OBJS += util/callchain.o +LIB_OBJS += util/values.o BUILTIN_OBJS += builtin-annotate.o BUILTIN_OBJS += builtin-help.o diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 99274ce..4163918 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -17,6 +17,7 @@ #include "util/string.h" #include "util/callchain.h" #include "util/strlist.h" +#include "util/values.h" #include "perf.h" #include "util/header.h" @@ -53,6 +54,9 @@ static int modules; static int full_paths; static int show_nr_samples; +static int show_threads; +static struct perf_read_values show_threads_values; + static unsigned long page_size; static unsigned long mmap_window = 32; @@ -1473,6 +1477,9 @@ print_entries: free(rem_sq_bracket); + if (show_threads) + perf_read_values_display(fp, &show_threads_values); + return ret; } @@ -1758,6 +1765,16 @@ process_read_event(event_t *event, unsigned long offset, unsigned long head) { struct perf_counter_attr *attr = perf_header__find_attr(event->read.id); + if (show_threads) { + char *name = attr ? __event_name(attr->type, attr->config) + : "unknown"; + perf_read_values_add_value(&show_threads_values, + event->read.pid, event->read.tid, + event->read.id, + name, + event->read.value); + } + dprintf("%p [%p]: PERF_EVENT_READ: %d %d %s %Lu\n", (void *)(offset + head), (void *)(long)(event->header.size), @@ -1839,6 +1856,9 @@ static int __cmd_report(void) register_idle_thread(); + if (show_threads) + perf_read_values_init(&show_threads_values); + input = open(input_name, O_RDONLY); if (input < 0) { fprintf(stderr, " failed to open file: %s", input_name); @@ -1993,6 +2013,9 @@ done: output__resort(total); output__fprintf(stdout, total); + if (show_threads) + perf_read_values_destroy(&show_threads_values); + return rc; } @@ -2066,6 +2089,8 @@ static const struct option options[] = { "load module symbols - WARNING: use only with -k and LIVE kernel"), OPT_BOOLEAN('n', "show-nr-samples", &show_nr_samples, "Show a column with the number of samples"), + OPT_BOOLEAN('T', "threads", &show_threads, + "Show per-thread event counters"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", "sort by key(s): pid, comm, dso, symbol, parent"), OPT_BOOLEAN('P', "full-paths", &full_paths, diff --git a/tools/perf/util/values.c b/tools/perf/util/values.c new file mode 100644 index 0000000..8551c0b --- /dev/null +++ b/tools/perf/util/values.c @@ -0,0 +1,171 @@ +#include <stdlib.h> + +#include "util.h" +#include "values.h" + +void perf_read_values_init(struct perf_read_values *values) +{ + values->threads_max = 16; + values->pid = malloc(values->threads_max * sizeof(*values->pid)); + values->tid = malloc(values->threads_max * sizeof(*values->tid)); + values->value = malloc(values->threads_max * sizeof(*values->value)); + if (!values->pid || !values->tid || !values->value) + die("failed to allocate read_values threads arrays"); + values->threads = 0; + + values->counters_max = 16; + values->counterrawid = malloc(values->counters_max + * sizeof(*values->counterrawid)); + values->countername = malloc(values->counters_max + * sizeof(*values->countername)); + if (!values->counterrawid || !values->countername) + die("failed to allocate read_values counters arrays"); + values->counters = 0; +} + +void perf_read_values_destroy(struct perf_read_values *values) +{ + int i; + + if (!values->threads_max || !values->counters_max) + return; + + for (i = 0; i < values->threads; i++) + free(values->value[i]); + free(values->pid); + free(values->tid); + free(values->counterrawid); + for (i = 0; i < values->counters; i++) + free(values->countername[i]); + free(values->countername); +} + +static void perf_read_values__enlarge_threads(struct perf_read_values *values) +{ + values->threads_max *= 2; + values->pid = realloc(values->pid, + values->threads_max * sizeof(*values->pid)); + values->tid = realloc(values->tid, + values->threads_max * sizeof(*values->tid)); + values->value = realloc(values->value, + values->threads_max * sizeof(*values->value)); + if (!values->pid || !values->tid || !values->value) + die("failed to enlarge read_values threads arrays"); +} + +static int perf_read_values__findnew_thread(struct perf_read_values *values, + u32 pid, u32 tid) +{ + int i; + + for (i = 0; i < values->threads; i++) + if (values->pid[i] == pid && values->tid[i] == tid) + return i; + + if (values->threads == values->threads_max) + perf_read_values__enlarge_threads(values); + + i = values->threads++; + values->pid[i] = pid; + values->tid[i] = tid; + values->value[i] = malloc(values->counters_max * sizeof(**values->value)); + if (!values->value[i]) + die("failed to allocate read_values counters array"); + + return i; +} + +static void perf_read_values__enlarge_counters(struct perf_read_values *values) +{ + int i; + + values->counters_max *= 2; + values->counterrawid = realloc(values->counterrawid, + values->counters_max * sizeof(*values->counterrawid)); + values->countername = realloc(values->countername, + values->counters_max * sizeof(*values->countername)); + if (!values->counterrawid || !values->countername) + die("failed to enlarge read_values counters arrays"); + + for (i = 0; i < values->threads; i++) { + values->value[i] = realloc(values->value[i], + values->counters_max * sizeof(**values->value)); + if (!values->value[i]) + die("failed to enlarge read_values counters arrays"); + } +} + +static int perf_read_values__findnew_counter(struct perf_read_values *values, + u64 rawid, char *name) +{ + int i; + + for (i = 0; i < values->counters; i++) + if (values->counterrawid[i] == rawid) + return i; + + if (values->counters == values->counters_max) + perf_read_values__enlarge_counters(values); + + i = values->counters++; + values->counterrawid[i] = rawid; + values->countername[i] = strdup(name); + + return i; +} + +void perf_read_values_add_value(struct perf_read_values *values, + u32 pid, u32 tid, + u64 rawid, char *name, u64 value) +{ + int tindex, cindex; + + tindex = perf_read_values__findnew_thread(values, pid, tid); + cindex = perf_read_values__findnew_counter(values, rawid, name); + + values->value[tindex][cindex] = value; +} + +void perf_read_values_display(FILE *fp, struct perf_read_values *values) +{ + int i, j; + int pidwidth, tidwidth; + int *counterwidth; + + counterwidth = malloc(values->counters * sizeof(*counterwidth)); + if (!counterwidth) + die("failed to allocate counterwidth array"); + tidwidth = 3; + pidwidth = 3; + for (j = 0; j < values->counters; j++) + counterwidth[j] = strlen(values->countername[j]); + for (i = 0; i < values->threads; i++) { + int width; + + width = snprintf(NULL, 0, "%d", values->pid[i]); + if (width > pidwidth) + pidwidth = width; + width = snprintf(NULL, 0, "%d", values->tid[i]); + if (width > tidwidth) + tidwidth = width; + for (j = 0; j < values->counters; j++) { + width = snprintf(NULL, 0, "%Lu", values->value[i][j]); + if (width > counterwidth[j]) + counterwidth[j] = width; + } + } + + fprintf(fp, "# %*s %*s", pidwidth, "PID", tidwidth, "TID"); + for (j = 0; j < values->counters; j++) + fprintf(fp, " %*s", counterwidth[j], values->countername[j]); + fprintf(fp, "\n"); + + for (i = 0; i < values->threads; i++) { + fprintf(fp, " %*d %*d", pidwidth, values->pid[i], + tidwidth, values->tid[i]); + for (j = 0; j < values->counters; j++) + fprintf(fp, " %*Lu", + counterwidth[j], values->value[i][j]); + fprintf(fp, "\n"); + } +} diff --git a/tools/perf/util/values.h b/tools/perf/util/values.h new file mode 100644 index 0000000..e41be5e --- /dev/null +++ b/tools/perf/util/values.h @@ -0,0 +1,26 @@ +#ifndef _PERF_VALUES_H +#define _PERF_VALUES_H + +#include "types.h" + +struct perf_read_values { + int threads; + int threads_max; + u32 *pid, *tid; + int counters; + int counters_max; + u64 *counterrawid; + char **countername; + u64 **value; +}; + +void perf_read_values_init(struct perf_read_values *values); +void perf_read_values_destroy(struct perf_read_values *values); + +void perf_read_values_add_value(struct perf_read_values *values, + u32 pid, u32 tid, + u64 rawid, char *name, u64 value); + +void perf_read_values_display(FILE *fp, struct perf_read_values *values); + +#endif /* _PERF_VALUES_H */ ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] perf report: Display per-thread event counters 2009-08-07 11:55 ` [PATCH] perf report: Display per-thread event counters Brice Goglin 2009-08-08 11:54 ` [tip:perfcounters/core] perf report: Fix and improve the displaying of " tip-bot for Brice Goglin @ 2009-08-08 12:14 ` Ingo Molnar 2009-08-08 16:10 ` Brice Goglin 1 sibling, 1 reply; 41+ messages in thread From: Ingo Molnar @ 2009-08-08 12:14 UTC (permalink / raw) To: Brice Goglin, Frédéric Weisbecker, Mike Galbraith, Arnaldo Carvalho de Melo Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Here's a better patch. I moved everything to utils/values.[ch] so > that we may reuse it in perf stat. [...] Nice patch! I've applied it, you can find it in the latest -tip tree: http://people.redhat.com/mingo/tip.git/README please send enhancements/fixes on top of this. > [...] But I don't see yet where I am suppose to get something like > PERF_READ_EVENT in builtin-stat.c so I haven't touched it yet. Yeah. 'perf stat' is not really getting events but is doing a read-out of the counter value(s) and constructs its 'read event' that way. So you wont find PERF_READ_EVENT in builtin-stat.c, you'll find: res = read(fd[cpu][counter], single_count, nv * sizeof(u64)); in read_counter(). The printout is then done in print_counter(). > We get something like this now: > # PID TID cache-misses cache-references > 4658 4659 495581 3238779 > 4658 4662 498246 3236823 > 4658 4663 499531 3243162 > > Then it'll be easy to add --pretty=raw to display a single line > per thread/event. ok. > By the way, -S was also used for --symbol... So I used -T/--thread > here. Hm, indeed - and -s was taken for --sort. Maybe we could rename -S/--symbols to -y/--symbols - this too is an i think rarely used feature. I think pure 'statistics' runs like you do will be a pretty popular workflow, so intuitive naming/placement of options is important. > perf report: Add -T/--threads to display per-thread counter values > > We get something like this now: > # PID TID cache-misses cache-references > 4658 4659 495581 3238779 > 4658 4662 498246 3236823 > 4658 4663 499531 3243162 Btw., another thing to do would be to allow the 'dual' recording of both the stat values (collected when threads exit) and regular samples that perf report deals with. I.e. dont handle 'perf record -s' as an exclusive thing to regular 'perf record', but instead have -s/--sample-type option that can have such combinations: -s stats -s samples -s call-graph And any combination thereof, such as: -s stats,samples The default would be '-s samples'. Right now call-graph recording is triggered via a separate option (-g/--call-graph) - but maybe it could be merged into a more generic -s/--sample option mechanism? Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf report: Display per-thread event counters 2009-08-08 12:14 ` [PATCH] perf report: Display " Ingo Molnar @ 2009-08-08 16:10 ` Brice Goglin 2009-08-08 16:13 ` Ingo Molnar 0 siblings, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-08-08 16:10 UTC (permalink / raw) To: Ingo Molnar Cc: Frédéric Weisbecker, Mike Galbraith, Arnaldo Carvalho de Melo, Peter Zijlstra, paulus, LKML Ingo Molnar wrote: >> [...] But I don't see yet where I am suppose to get something like >> PERF_READ_EVENT in builtin-stat.c so I haven't touched it yet. >> > > Yeah. 'perf stat' is not really getting events but is doing a > read-out of the counter value(s) and constructs its 'read event' > that way. So you wont find PERF_READ_EVENT in builtin-stat.c, you'll > find: > > res = read(fd[cpu][counter], single_count, nv * sizeof(u64)); > > in read_counter(). The printout is then done in print_counter(). > Is there a way to get per-thread counters there? I wrote the code to gather per-cpu counters there, but I don't see any way to get the corresponding thread-id. I looked at perf record to get some help. But I don't see where the PERF_EVENT_READ are generated. I guess they are directly generated by the kernel, read by perf record, and written as is to the output file? thanks, Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] perf report: Display per-thread event counters 2009-08-08 16:10 ` Brice Goglin @ 2009-08-08 16:13 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2009-08-08 16:13 UTC (permalink / raw) To: Brice Goglin Cc: Frédéric Weisbecker, Mike Galbraith, Arnaldo Carvalho de Melo, Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Ingo Molnar wrote: > >> [...] But I don't see yet where I am suppose to get something like > >> PERF_READ_EVENT in builtin-stat.c so I haven't touched it yet. > >> > > > > Yeah. 'perf stat' is not really getting events but is doing a > > read-out of the counter value(s) and constructs its 'read event' > > that way. So you wont find PERF_READ_EVENT in builtin-stat.c, you'll > > find: > > > > res = read(fd[cpu][counter], single_count, nv * sizeof(u64)); > > > > in read_counter(). The printout is then done in print_counter(). > > Is there a way to get per-thread counters there? I wrote the code > to gather per-cpu counters there, but I don't see any way to get > the corresponding thread-id. > > I looked at perf record to get some help. But I don't see where > the PERF_EVENT_READ are generated. I guess they are directly > generated by the kernel, read by perf record, and written as is to > the output file? Inherited counters are not accessible to the parent context. (they dont even have any fds instantiated, for performance and transparency reasons.) I think perf stat could be enhanced to work not via reading the raw counters but by doing a mini "perf-record" internally, mmap the samples buffer and getting all the events there? Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* [tip:perfcounters/urgent] perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header 2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra 2009-08-06 19:03 ` Brice Goglin @ 2009-08-07 6:37 ` tip-bot for Peter Zijlstra 2009-08-07 7:39 ` tip-bot for Peter Zijlstra 2 siblings, 0 replies; 41+ messages in thread From: tip-bot for Peter Zijlstra @ 2009-08-07 6:37 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Brice.Goglin, hpa, mingo, a.p.zijlstra, tglx, mingo Commit-ID: 28d4db10a0f2a2a6cb9a843522f433ff0185c784 Gitweb: http://git.kernel.org/tip/28d4db10a0f2a2a6cb9a843522f433ff0185c784 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> AuthorDate: Thu, 6 Aug 2009 20:57:41 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 7 Aug 2009 08:32:55 +0200 perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header Brice Goglin reported that only the first result from a multi-counter perf record --stat run is accurate, the rest looks bogus. A silly mistake made us re-read the first attribute for every recorded attribute. Reported-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: paulus@samba.org LKML-Reference: <1249585061.4975.17.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/util/header.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 450384b..95a44bc 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -213,9 +213,10 @@ struct perf_header *perf_header__read(int fd) for (i = 0; i < nr_attrs; i++) { struct perf_header_attr *attr; - off_t tmp = lseek(fd, 0, SEEK_CUR); + off_t tmp; do_read(fd, &f_attr, sizeof(f_attr)); + tmp = lseek(fd, 0, SEEK_CUR); attr = perf_header_attr__new(&f_attr.attr); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [tip:perfcounters/urgent] perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header 2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra 2009-08-06 19:03 ` Brice Goglin 2009-08-07 6:37 ` [tip:perfcounters/urgent] perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header tip-bot for Peter Zijlstra @ 2009-08-07 7:39 ` tip-bot for Peter Zijlstra 2 siblings, 0 replies; 41+ messages in thread From: tip-bot for Peter Zijlstra @ 2009-08-07 7:39 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Brice.Goglin, hpa, mingo, a.p.zijlstra, tglx, mingo Commit-ID: ce71e78d91c4e7c02ce85f26319e53a824be4ffb Gitweb: http://git.kernel.org/tip/ce71e78d91c4e7c02ce85f26319e53a824be4ffb Author: Peter Zijlstra <a.p.zijlstra@chello.nl> AuthorDate: Thu, 6 Aug 2009 20:57:41 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 7 Aug 2009 09:37:29 +0200 perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header Brice Goglin reported that only the first result from a multi-counter perf record --stat run is accurate, the rest looks bogus. A silly mistake made us re-read the first attribute for every recorded attribute. Reported-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: paulus@samba.org LKML-Reference: <1249585061.4975.17.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/util/header.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 450384b..95a44bc 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -213,9 +213,10 @@ struct perf_header *perf_header__read(int fd) for (i = 0; i < nr_attrs; i++) { struct perf_header_attr *attr; - off_t tmp = lseek(fd, 0, SEEK_CUR); + off_t tmp; do_read(fd, &f_attr, sizeof(f_attr)); + tmp = lseek(fd, 0, SEEK_CUR); attr = perf_header_attr__new(&f_attr.attr); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-08-06 17:40 ` Peter Zijlstra 2009-08-06 17:48 ` Brice Goglin @ 2009-08-06 19:01 ` Brice Goglin 1 sibling, 0 replies; 41+ messages in thread From: Brice Goglin @ 2009-08-06 19:01 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, paulus, LKML Tested-by: Brice Goglin <Brice.Goglin@inria.fr> This one and the next patch made the trick. Peter Zijlstra wrote: > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c > index 8cb58d6..c053fd8 100644 > --- a/tools/perf/builtin-report.c > +++ b/tools/perf/builtin-report.c > @@ -112,7 +112,9 @@ struct read_event { > struct perf_event_header header; > u32 pid,tid; > u64 value; > - u64 format[3]; > + u64 time_enabled; > + u64 time_running; > + u64 id; > }; > > typedef union event_union { > @@ -1690,14 +1692,37 @@ static void trace_event(event_t *event) > dprintf(".\n"); > } > > +static struct perf_header *header; > + > +static struct perf_counter_attr *perf_header__find_attr(u64 id) > +{ > + int i; > + > + for (i = 0; i < header->attrs; i++) { > + struct perf_header_attr *attr = header->attr[i]; > + int j; > + > + for (j = 0; j < attr->ids; j++) { > + if (attr->id[j] == id) > + return &attr->attr; > + } > + } > + > + return NULL; > +} > + > static int > process_read_event(event_t *event, unsigned long offset, unsigned long head) > { > - dprintf("%p [%p]: PERF_EVENT_READ: %d %d %Lu\n", > + struct perf_counter_attr *attr = perf_header__find_attr(event->read.id); > + > + dprintf("%p [%p]: PERF_EVENT_READ: %d %d %s %Lu\n", > (void *)(offset + head), > (void *)(long)(event->header.size), > event->read.pid, > event->read.tid, > + attr ? __event_name(attr->type, attr->config) > + : "FAIL", > event->read.value); > > return 0; > @@ -1743,8 +1768,6 @@ process_event(event_t *event, unsigned long offset, unsigned long head) > return 0; > } > > -static struct perf_header *header; > - > static u64 perf_header__sample_type(void) > { > u64 sample_type = 0; > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c > index 7bdad8d..4858d83 100644 > --- a/tools/perf/util/parse-events.c > +++ b/tools/perf/util/parse-events.c > @@ -223,9 +239,15 @@ char *event_name(int counter) > { > u64 config = attrs[counter].config; > int type = attrs[counter].type; > + > + return __event_name(type, config); > +} > + > +char *__event_name(int type, u64 config) > +{ > static char buf[32]; > > - if (attrs[counter].type == PERF_TYPE_RAW) { > + if (type == PERF_TYPE_RAW) { > sprintf(buf, "raw 0x%llx", config); > return buf; > } > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h > index 1ea5d09..192a962 100644 > --- a/tools/perf/util/parse-events.h > +++ b/tools/perf/util/parse-events.h > @@ -10,6 +10,7 @@ extern int nr_counters; > extern struct perf_counter_attr attrs[MAX_COUNTERS]; > > extern char *event_name(int ctr); > +extern char *__event_name(int type, u64 config); > > extern int parse_events(const struct option *opt, const char *str, int unset); > > > > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 13:14 ` Ingo Molnar ` (2 preceding siblings ...) 2009-06-23 13:47 ` Ingo Molnar @ 2009-06-23 14:21 ` Brice Goglin 2009-06-23 14:51 ` Ingo Molnar 3 siblings, 1 reply; 41+ messages in thread From: Brice Goglin @ 2009-06-23 14:21 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, paulus, LKML Ingo Molnar wrote: > You can also do a profile with such events: > > perf record -f -e r1000ffe0 ./hackbench 10 > > and look at it via 'perf report'. > I am not sure what the perf.data profile file contains but 'perf report' only shows percentages. Is there a way to get a 'perf stat'-like output from 'perf report'? Or maybe just have a -f option in 'perf stat' to send the output into a file (with the PID in the name). By the way, there's a typo in the description in tools/perf/Documentation/perf-report.txt, you want s/via perf report/via perf record/ > [ Note, there's no need to specify any --follow-* flags as that is > implicit in 'perf'. (and you'll probably also notice that perf > stat is a lot faster at following fast-forking or > context-switching workloads than is pfmon, because it's not ptrace > based.) ] > What about threads? I didn't find any way to get per-thread counters. Ideally, I'd like to be able to see no perf-related output on stdout/stderr at runtime, and later have a look at per-thread counters like 'perf stat' does at runtime. thanks, Brice ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 14:21 ` Brice Goglin @ 2009-06-23 14:51 ` Ingo Molnar 2009-06-23 15:29 ` Jaswinder Singh Rajput 0 siblings, 1 reply; 41+ messages in thread From: Ingo Molnar @ 2009-06-23 14:51 UTC (permalink / raw) To: Brice Goglin, Mike Galbraith, Arnaldo Carvalho de Melo, Jaswinder Singh Rajput, Thomas Gleixner Cc: Peter Zijlstra, paulus, LKML * Brice Goglin <Brice.Goglin@inria.fr> wrote: > Ingo Molnar wrote: > > You can also do a profile with such events: > > > > perf record -f -e r1000ffe0 ./hackbench 10 > > > > and look at it via 'perf report'. > > > > I am not sure what the perf.data profile file contains but 'perf > report' only shows percentages. Is there a way to get a 'perf > stat'-like output from 'perf report'? Or maybe just have a -f > option in 'perf stat' to send the output into a file (with the PID > in the name). It's not yet possible but it's a very good feature request. > By the way, there's a typo in the description in > tools/perf/Documentation/perf-report.txt, you want s/via perf > report/via perf record/ thanks, fixed and pushed out. You can generally find the latest 'perf' stuff at: http://people.redhat.com/mingo/tip.git/README > > [ Note, there's no need to specify any --follow-* flags as that is > > implicit in 'perf'. (and you'll probably also notice that perf > > stat is a lot faster at following fast-forking or > > context-switching workloads than is pfmon, because it's not ptrace > > based.) ] > > What about threads? I didn't find any way to get per-thread > counters. > > Ideally, I'd like to be able to see no perf-related output on > stdout/stderr at runtime, and later have a look at per-thread > counters like 'perf stat' does at runtime. That's not possible yet either, but makes a lot of sense. How many threads does your workload typically run, and how do you get their stats displayed? Per thread info is currently available in the profile output: perf report --sort comm,pid,symbol But it would be nice to either extend perf report with a --stat option: perf report --stat or to extend perf stat to take an input file via -i: perf stat -i perf.data Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [perf] howto switch from pfmon 2009-06-23 14:51 ` Ingo Molnar @ 2009-06-23 15:29 ` Jaswinder Singh Rajput 0 siblings, 0 replies; 41+ messages in thread From: Jaswinder Singh Rajput @ 2009-06-23 15:29 UTC (permalink / raw) To: Ingo Molnar Cc: Brice Goglin, Mike Galbraith, Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra, paulus, LKML On Tue, 2009-06-23 at 16:51 +0200, Ingo Molnar wrote: > Per thread info is currently available in the profile output: > > perf report --sort comm,pid,symbol > > But it would be nice to either extend perf report with a --stat > option: > > perf report --stat > > or to extend perf stat to take an input file via -i: > > perf stat -i perf.data > I prefer 'perf report --stat' as it is already handling file. -- JSR ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2009-08-08 16:13 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-22 20:54 [perf] howto switch from pfmon Brice Goglin 2009-06-23 12:12 ` Andi Kleen 2009-06-23 12:23 ` Peter Zijlstra 2009-06-23 13:57 ` Ingo Molnar 2009-06-23 13:14 ` Ingo Molnar 2009-06-23 13:22 ` Peter Zijlstra 2009-06-23 13:38 ` Ingo Molnar 2009-06-23 13:25 ` Ingo Molnar 2009-06-23 13:47 ` Ingo Molnar 2009-06-23 14:00 ` Brice Goglin 2009-06-23 14:36 ` Ingo Molnar 2009-06-23 15:22 ` Brice Goglin 2009-06-29 19:29 ` Ingo Molnar 2009-08-06 16:59 ` Brice Goglin 2009-08-06 17:40 ` Peter Zijlstra 2009-08-06 17:48 ` Brice Goglin 2009-08-06 17:59 ` Peter Zijlstra 2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra 2009-08-06 19:03 ` Brice Goglin 2009-08-06 19:59 ` Ingo Molnar 2009-08-06 20:03 ` Brice Goglin 2009-08-06 23:35 ` Brice Goglin 2009-08-07 6:13 ` Brice Goglin 2009-08-07 6:32 ` Ingo Molnar 2009-08-07 7:38 ` Brice Goglin 2009-08-07 7:45 ` Ingo Molnar 2009-08-07 8:18 ` Brice Goglin 2009-08-07 8:23 ` Ingo Molnar 2009-08-07 8:27 ` Ingo Molnar 2009-08-07 8:30 ` [tip:perfcounters/core] perf stat: Rename -S/--scale to -c/--scale tip-bot for Brice Goglin 2009-08-07 11:55 ` [PATCH] perf report: Display per-thread event counters Brice Goglin 2009-08-08 11:54 ` [tip:perfcounters/core] perf report: Fix and improve the displaying of " tip-bot for Brice Goglin 2009-08-08 12:14 ` [PATCH] perf report: Display " Ingo Molnar 2009-08-08 16:10 ` Brice Goglin 2009-08-08 16:13 ` Ingo Molnar 2009-08-07 6:37 ` [tip:perfcounters/urgent] perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header tip-bot for Peter Zijlstra 2009-08-07 7:39 ` tip-bot for Peter Zijlstra 2009-08-06 19:01 ` [perf] howto switch from pfmon Brice Goglin 2009-06-23 14:21 ` Brice Goglin 2009-06-23 14:51 ` Ingo Molnar 2009-06-23 15:29 ` Jaswinder Singh Rajput
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox