* perf-stat per thread results
@ 2011-05-13 15:14 Wim Heirman
2011-05-13 15:30 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-13 15:14 UTC (permalink / raw)
To: linux-kernel
Hi all,
I'm using perf-stat to look at hardware performance counters for a
parallel program. Is there a way to get counter values for each thread
individually, rather than aggregated for the whole process? I know I
can attach to a specific thread using --tid=, but due to the time
required to find the tid and attach/detach this isn't accurate for
short-running programs.
Or, alternatively, can I use perf record --stat and get an exact count
for each performance counter?
Thanks,
Wim Heirman
Ghent University, Belgium
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: perf-stat per thread results 2011-05-13 15:14 perf-stat per thread results Wim Heirman @ 2011-05-13 15:30 ` Ingo Molnar 2011-05-13 15:41 ` David Ahern 2011-05-13 15:35 ` David Ahern 2011-05-13 20:11 ` Juri Lelli 2 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2011-05-13 15:30 UTC (permalink / raw) To: Wim Heirman Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo, Thomas Gleixner, Frédéric Weisbecker * Wim Heirman <wim@heirman.net> wrote: > Hi all, > > I'm using perf-stat to look at hardware performance counters for a > parallel program. Is there a way to get counter values for each thread > individually, rather than aggregated for the whole process? [...] Not at the moment, but it would be a useful feature. > [...] I know I can attach to a specific thread using --tid=, but due to the > time required to find the tid and attach/detach this isn't accurate for > short-running programs. Or, alternatively, can I use perf record --stat and > get an exact count for each performance counter? Yes perf record --stat should work. 'perf report -T --stdio' is supposed to print this, but it has regressed i think. Arnaldo, any ideas? Thanks, Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 15:30 ` Ingo Molnar @ 2011-05-13 15:41 ` David Ahern 0 siblings, 0 replies; 15+ messages in thread From: David Ahern @ 2011-05-13 15:41 UTC (permalink / raw) To: Ingo Molnar Cc: Wim Heirman, linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo, Thomas Gleixner, Frédéric Weisbecker On 05/13/11 09:30, Ingo Molnar wrote: > Yes perf record --stat should work. 'perf report -T --stdio' is supposed to > print this, but it has regressed i think. There is a known regression in 2.6.39; perf-record and perf-top cannot profile all threads in a process. They fail with: Fatal: failed to mmap with 22 (Invalid argument) Curiously, perf-stat does work - or at least does not fail with the mmap error. David > > Arnaldo, any ideas? > > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 15:14 perf-stat per thread results Wim Heirman 2011-05-13 15:30 ` Ingo Molnar @ 2011-05-13 15:35 ` David Ahern 2011-05-13 15:44 ` Ingo Molnar 2011-05-13 20:11 ` Juri Lelli 2 siblings, 1 reply; 15+ messages in thread From: David Ahern @ 2011-05-13 15:35 UTC (permalink / raw) To: Wim Heirman; +Cc: linux-kernel On 05/13/11 09:14, Wim Heirman wrote: > Hi all, > > I'm using perf-stat to look at hardware performance counters for a > parallel program. Is there a way to get counter values for each thread > individually, rather than aggregated for the whole process? I know I > can attach to a specific thread using --tid=, but due to the time > required to find the tid and attach/detach this isn't accurate for > short-running programs. perf-stat requires changes to dump counters per thread; it currently sums all threads into a single value. > Or, alternatively, can I use perf record --stat and get an exact count > for each performance counter? perf-record does not read values from hardware counters. David > > Thanks, > Wim Heirman > Ghent University, Belgium > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 15:35 ` David Ahern @ 2011-05-13 15:44 ` Ingo Molnar 2011-05-13 20:32 ` Wim Heirman 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2011-05-13 15:44 UTC (permalink / raw) To: David Ahern Cc: Wim Heirman, linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin * David Ahern <dsahern@gmail.com> wrote: > On 05/13/11 09:14, Wim Heirman wrote: > > Hi all, > > > > I'm using perf-stat to look at hardware performance counters for a > > parallel program. Is there a way to get counter values for each thread > > individually, rather than aggregated for the whole process? I know I > > can attach to a specific thread using --tid=, but due to the time > > required to find the tid and attach/detach this isn't accurate for > > short-running programs. > > perf-stat requires changes to dump counters per thread; it currently > sums all threads into a single value. > > > Or, alternatively, can I use perf record --stat and get an exact count > > for each performance counter? > > perf-record does not read values from hardware counters. It's supposed to do that if --stat is specified, and it used to work - see this commit: 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters and the output there: # PID TID cache-misses cache-references 4658 4659 495581 3238779 4658 4662 498246 3236823 4658 4663 499531 3243162 which appears to be roughly what Wim is asking for, AFAICT. But this seems to have regressed meanwhile. Thanks, Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 15:44 ` Ingo Molnar @ 2011-05-13 20:32 ` Wim Heirman 2011-05-13 20:45 ` Arnaldo Carvalho de Melo 2011-05-13 23:02 ` David Ahern 0 siblings, 2 replies; 15+ messages in thread From: Wim Heirman @ 2011-05-13 20:32 UTC (permalink / raw) To: Ingo Molnar Cc: David Ahern, linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin Hi, > It's supposed to do that if --stat is specified, and it used to work - see this > commit: > > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters > > and the output there: > > # PID TID cache-misses cache-references > 4658 4659 495581 3238779 > 4658 4662 498246 3236823 > 4658 4663 499531 3243162 > > which appears to be roughly what Wim is asking for, AFAICT. Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04) it works, although if I use --pid rather than the -- <command> variant the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the first thread is missing in both use cases, and I get one column per processor (which in itself is fine). Regards, Wim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 20:32 ` Wim Heirman @ 2011-05-13 20:45 ` Arnaldo Carvalho de Melo 2011-05-14 12:45 ` Wim Heirman 2011-05-13 23:02 ` David Ahern 1 sibling, 1 reply; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2011-05-13 20:45 UTC (permalink / raw) To: Wim Heirman Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu: > > It's supposed to do that if --stat is specified, and it used to work - see this > > commit: > > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters > > and the output there: > > > > # PID TID cache-misses cache-references > > 4658 4659 495581 3238779 > > 4658 4662 498246 3236823 > > 4658 4663 499531 3243162 > > which appears to be roughly what Wim is asking for, AFAICT. > Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04) > it works, although if I use --pid rather than the -- <command> variant > the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the > first thread is missing in both use cases, and I get one column per > processor (which in itself is fine). Can you try after applying the patches in this message: http://marc.info/?l=linux-kernel&m=130385067430510&w=2 and report your results? If it fixes the problems you're experiencing, please provide a: Tested-by: Wim Heirman <wim@heirman.net> So that I can add when sending them to Ingo. Thanks, - Arnaldo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 20:45 ` Arnaldo Carvalho de Melo @ 2011-05-14 12:45 ` Wim Heirman 2011-05-14 20:24 ` Wim Heirman 0 siblings, 1 reply; 15+ messages in thread From: Wim Heirman @ 2011-05-14 12:45 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin On 13 May 2011 22:45, Arnaldo Carvalho de Melo <acme@redhat.com> wrote: > Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu: >> > It's supposed to do that if --stat is specified, and it used to work - see this >> > commit: > >> > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters > >> > and the output there: >> > >> > # PID TID cache-misses cache-references >> > 4658 4659 495581 3238779 >> > 4658 4662 498246 3236823 >> > 4658 4663 499531 3243162 > >> > which appears to be roughly what Wim is asking for, AFAICT. > >> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04) >> it works, although if I use --pid rather than the -- <command> variant >> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the >> first thread is missing in both use cases, and I get one column per >> processor (which in itself is fine). > > Can you try after applying the patches in this message: > > http://marc.info/?l=linux-kernel&m=130385067430510&w=2 > > and report your results? Sorry, no improvement. $ ./perf record -e cycles --stat -- ./fft -p4 -m24 && ./perf report --thread | tail [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 1.198 MB perf.data (~52331 samples) ] # PID TID cpu-clock 954 958 8067423322 954 957 6761317556 954 956 6006327147 $ ls /proc/$(pidof fft)/task 954 956 957 958 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-14 12:45 ` Wim Heirman @ 2011-05-14 20:24 ` Wim Heirman 0 siblings, 0 replies; 15+ messages in thread From: Wim Heirman @ 2011-05-14 20:24 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin On 14 May 2011 14:45, Wim Heirman <wim@heirman.net> wrote: > On 13 May 2011 22:45, Arnaldo Carvalho de Melo <acme@redhat.com> wrote: >> Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu: >>> > It's supposed to do that if --stat is specified, and it used to work - see this >>> > commit: >> >>> > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters >> >>> > and the output there: >>> > >>> > # PID TID cache-misses cache-references >>> > 4658 4659 495581 3238779 >>> > 4658 4662 498246 3236823 >>> > 4658 4663 499531 3243162 >> >>> > which appears to be roughly what Wim is asking for, AFAICT. >> >>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04) >>> it works, although if I use --pid rather than the -- <command> variant >>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the >>> first thread is missing in both use cases, and I get one column per >>> processor (which in itself is fine). >> >> Can you try after applying the patches in this message: >> >> http://marc.info/?l=linux-kernel&m=130385067430510&w=2 >> >> and report your results? > > Sorry, no improvement. > > $ ./perf record -e cycles --stat -- ./fft -p4 -m24 && ./perf report > --thread | tail > [ perf record: Woken up 5 times to write data ] > [ perf record: Captured and wrote 1.198 MB perf.data (~52331 samples) ] > # PID TID cpu-clock > 954 958 8067423322 > 954 957 6761317556 > 954 956 6006327147 > > $ ls /proc/$(pidof fft)/task > 954 956 957 958 Looks like perf-report --thread is reading PERF_RECORD_READ events from perf.data. But these are only emitted by the kernel for child threads: in kernel/events/core.c, the only call to perf_event_read_event() is in sync_child_event(). Should perf-record then be adapted to do something like perf-stat does and use __perf_evsel__read to read the parent counter's final values and add them to perf.data ? That way perf-report can subtract all children from the final value and get the main thread's counter values. Regards, Wim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 20:32 ` Wim Heirman 2011-05-13 20:45 ` Arnaldo Carvalho de Melo @ 2011-05-13 23:02 ` David Ahern 2011-05-14 12:49 ` Wim Heirman 1 sibling, 1 reply; 15+ messages in thread From: David Ahern @ 2011-05-13 23:02 UTC (permalink / raw) To: Wim Heirman, Ingo Molnar, Arnaldo Carvalho de Melo Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin On 05/13/11 14:32, Wim Heirman wrote: > Hi, > >> It's supposed to do that if --stat is specified, and it used to work - see this >> commit: >> >> 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters >> >> and the output there: >> >> # PID TID cache-misses cache-references >> 4658 4659 495581 3238779 >> 4658 4662 498246 3236823 >> 4658 4663 499531 3243162 >> >> which appears to be roughly what Wim is asking for, AFAICT. > > Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04) > it works, although if I use --pid rather than the -- <command> variant > the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the > first thread is missing in both use cases, and I get one column per > processor (which in itself is fine). > > Regards, > Wim Hmm.... my mileage varies using latest kernel (446cc6345d3de6571bdd0840f48aca441488a28d) $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd) ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ] $ /tmp/build-perf/perf report -T -i /tmp/perf.data # Events: 6 cycles # # Overhead Command Shared Object Symbol # ........ ........ ................. .......................... # 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe # # (For a higher level overview, try: perf report --sort comm,dso) # # PID TID ie., I do not get the counter values. Specifying the counter with -e (e.g., -e branch-misses) does not help -- still no counter output. David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 23:02 ` David Ahern @ 2011-05-14 12:49 ` Wim Heirman 2011-05-14 17:48 ` David Ahern 0 siblings, 1 reply; 15+ messages in thread From: Wim Heirman @ 2011-05-14 12:49 UTC (permalink / raw) To: David Ahern Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin On 14 May 2011 01:02, David Ahern <dsahern@gmail.com> wrote: > > > On 05/13/11 14:32, Wim Heirman wrote: >> Hi, >> >>> It's supposed to do that if --stat is specified, and it used to work - see this >>> commit: >>> >>> 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters >>> >>> and the output there: >>> >>> # PID TID cache-misses cache-references >>> 4658 4659 495581 3238779 >>> 4658 4662 498246 3236823 >>> 4658 4663 499531 3243162 >>> >>> which appears to be roughly what Wim is asking for, AFAICT. >> >> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04) >> it works, although if I use --pid rather than the -- <command> variant >> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the >> first thread is missing in both use cases, and I get one column per >> processor (which in itself is fine). >> >> Regards, >> Wim > > Hmm.... my mileage varies using latest kernel > (446cc6345d3de6571bdd0840f48aca441488a28d) > > $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd) > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ] > > $ /tmp/build-perf/perf report -T -i /tmp/perf.data > # Events: 6 cycles > # > # Overhead Command Shared Object Symbol > # ........ ........ ................. .......................... > # > 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel > 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe > > > # > # (For a higher level overview, try: perf report --sort comm,dso) > # > # PID TID > > > ie., I do not get the counter values. Specifying the counter with -e > (e.g., -e branch-misses) does not help -- still no counter output. Is rsyslogd multithreaded? (Or at least, do the non-main threads execute any work during your perf-record measurement) If not, then what you see is consistent with what I'm getting, i.e. everything but the main thread is reported. Regards, Wim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-14 12:49 ` Wim Heirman @ 2011-05-14 17:48 ` David Ahern 2011-05-14 18:52 ` Wim Heirman 0 siblings, 1 reply; 15+ messages in thread From: David Ahern @ 2011-05-14 17:48 UTC (permalink / raw) To: Wim Heirman Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin On 05/14/11 06:49, Wim Heirman wrote: >> Hmm.... my mileage varies using latest kernel >> (446cc6345d3de6571bdd0840f48aca441488a28d) >> >> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd) >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ] >> >> $ /tmp/build-perf/perf report -T -i /tmp/perf.data >> # Events: 6 cycles >> # >> # Overhead Command Shared Object Symbol >> # ........ ........ ................. .......................... >> # >> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel >> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe >> >> >> # >> # (For a higher level overview, try: perf report --sort comm,dso) >> # >> # PID TID >> >> >> ie., I do not get the counter values. Specifying the counter with -e >> (e.g., -e branch-misses) does not help -- still no counter output. > > Is rsyslogd multithreaded? (Or at least, do the non-main threads > execute any work during your perf-record measurement) If not, then > what you see is consistent with what I'm getting, i.e. everything but > the main thread is reported. It is multithreaded, but my point is that I do not get counter output at the end -- the PID/TID table is empty. I do not get counters for single threaded processes nor for commands run by perf record -- e.g., /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data -- sleep 1 David > > Regards, > Wim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-14 17:48 ` David Ahern @ 2011-05-14 18:52 ` Wim Heirman 2011-05-14 18:57 ` David Ahern 0 siblings, 1 reply; 15+ messages in thread From: Wim Heirman @ 2011-05-14 18:52 UTC (permalink / raw) To: David Ahern Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin 2011/5/14 David Ahern <dsahern@gmail.com>: > On 05/14/11 06:49, Wim Heirman wrote: >>> Hmm.... my mileage varies using latest kernel >>> (446cc6345d3de6571bdd0840f48aca441488a28d) >>> >>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd) >>> ^C[ perf record: Woken up 1 times to write data ] >>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ] >>> >>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data >>> # Events: 6 cycles >>> # >>> # Overhead Command Shared Object Symbol >>> # ........ ........ ................. .......................... >>> # >>> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel >>> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe >>> >>> >>> # >>> # (For a higher level overview, try: perf report --sort comm,dso) >>> # >>> # PID TID >>> >>> >>> ie., I do not get the counter values. Specifying the counter with -e >>> (e.g., -e branch-misses) does not help -- still no counter output. >> >> Is rsyslogd multithreaded? (Or at least, do the non-main threads >> execute any work during your perf-record measurement) If not, then >> what you see is consistent with what I'm getting, i.e. everything but >> the main thread is reported. > > It is multithreaded, but my point is that I do not get counter output at > the end -- the PID/TID table is empty. I do not get counters for single > threaded processes nor for commands run by perf record -- e.g., > /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data -- > sleep 1 My guess was that none of the threads got scheduled while you did your perf-record run (rsyslogd usually isn't exactly very CPU intensive). And the main thread isn't ever reported, at least that's the bug I'm seeing. Can you try with a compute-intensive, multi-threaded program? Wim. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-14 18:52 ` Wim Heirman @ 2011-05-14 18:57 ` David Ahern 0 siblings, 0 replies; 15+ messages in thread From: David Ahern @ 2011-05-14 18:57 UTC (permalink / raw) To: Wim Heirman Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker, Brice Goglin On 05/14/11 12:52, Wim Heirman wrote: > 2011/5/14 David Ahern <dsahern@gmail.com>: >> On 05/14/11 06:49, Wim Heirman wrote: >>>> Hmm.... my mileage varies using latest kernel >>>> (446cc6345d3de6571bdd0840f48aca441488a28d) >>>> >>>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd) >>>> ^C[ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ] >>>> >>>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data >>>> # Events: 6 cycles >>>> # >>>> # Overhead Command Shared Object Symbol >>>> # ........ ........ ................. .......................... >>>> # >>>> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel >>>> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe >>>> >>>> >>>> # >>>> # (For a higher level overview, try: perf report --sort comm,dso) >>>> # >>>> # PID TID >>>> >>>> >>>> ie., I do not get the counter values. Specifying the counter with -e >>>> (e.g., -e branch-misses) does not help -- still no counter output. >>> >>> Is rsyslogd multithreaded? (Or at least, do the non-main threads >>> execute any work during your perf-record measurement) If not, then >>> what you see is consistent with what I'm getting, i.e. everything but >>> the main thread is reported. >> >> It is multithreaded, but my point is that I do not get counter output at >> the end -- the PID/TID table is empty. I do not get counters for single >> threaded processes nor for commands run by perf record -- e.g., >> /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data -- >> sleep 1 > > My guess was that none of the threads got scheduled while you did your > perf-record run (rsyslogd usually isn't exactly very CPU intensive). > And the main thread isn't ever reported, at least that's the bug I'm > seeing. Can you try with a compute-intensive, multi-threaded program? > > Wim. The reports show data was collected. I have done a number of simple examples -- all of which execute at least 1 instruction, but the counters are not displayed (which they should be per the commit changelog commit). David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results 2011-05-13 15:14 perf-stat per thread results Wim Heirman 2011-05-13 15:30 ` Ingo Molnar 2011-05-13 15:35 ` David Ahern @ 2011-05-13 20:11 ` Juri Lelli 2 siblings, 0 replies; 15+ messages in thread From: Juri Lelli @ 2011-05-13 20:11 UTC (permalink / raw) To: Wim Heirman; +Cc: linux-kernel Hi, On 05/13/2011 05:14 PM, Wim Heirman wrote: > Hi all, > > I'm using perf-stat to look at hardware performance counters for a > parallel program. Is there a way to get counter values for each thread > individually, rather than aggregated for the whole process? I know I > can attach to a specific thread using --tid=, but due to the time > required to find the tid and attach/detach this isn't accurate for > short-running programs. > Or, alternatively, can I use perf record --stat and get an exact count > for each performance counter? > What about using PAPI library (http://icl.cs.utk.edu/papi/)? It can be built on top of the perf subsystem and allows to get counters values on a pre-thread basis. I used it for some experiments and works very well. Cheers, Juri -- Juri Lelli Via G. Moruzzi 1, 56124 Pisa (PI), Italy Scuola Superiore Sant'Anna TeCIP, ReTiS Lab Web Site: http://retis.sssup.it/~jlelli | Skype: jurile2712 ------------------------------------------------------------- Il male minore non esiste: è sempre il preannuncio di un male peggiore. (Sylos Labini) ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-05-14 20:24 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-13 15:14 perf-stat per thread results Wim Heirman 2011-05-13 15:30 ` Ingo Molnar 2011-05-13 15:41 ` David Ahern 2011-05-13 15:35 ` David Ahern 2011-05-13 15:44 ` Ingo Molnar 2011-05-13 20:32 ` Wim Heirman 2011-05-13 20:45 ` Arnaldo Carvalho de Melo 2011-05-14 12:45 ` Wim Heirman 2011-05-14 20:24 ` Wim Heirman 2011-05-13 23:02 ` David Ahern 2011-05-14 12:49 ` Wim Heirman 2011-05-14 17:48 ` David Ahern 2011-05-14 18:52 ` Wim Heirman 2011-05-14 18:57 ` David Ahern 2011-05-13 20:11 ` Juri Lelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).