* perf-stat per thread results
@ 2011-05-13 15:14 Wim Heirman
2011-05-13 15:30 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-13 15:14 UTC (permalink / raw)
To: linux-kernel
Hi all,
I'm using perf-stat to look at hardware performance counters for a
parallel program. Is there a way to get counter values for each thread
individually, rather than aggregated for the whole process? I know I
can attach to a specific thread using --tid=, but due to the time
required to find the tid and attach/detach this isn't accurate for
short-running programs.
Or, alternatively, can I use perf record --stat and get an exact count
for each performance counter?
Thanks,
Wim Heirman
Ghent University, Belgium
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 15:14 perf-stat per thread results Wim Heirman
@ 2011-05-13 15:30 ` Ingo Molnar
2011-05-13 15:41 ` David Ahern
2011-05-13 15:35 ` David Ahern
2011-05-13 20:11 ` Juri Lelli
2 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2011-05-13 15:30 UTC (permalink / raw)
To: Wim Heirman
Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
Thomas Gleixner, Frédéric Weisbecker
* Wim Heirman <wim@heirman.net> wrote:
> Hi all,
>
> I'm using perf-stat to look at hardware performance counters for a
> parallel program. Is there a way to get counter values for each thread
> individually, rather than aggregated for the whole process? [...]
Not at the moment, but it would be a useful feature.
> [...] I know I can attach to a specific thread using --tid=, but due to the
> time required to find the tid and attach/detach this isn't accurate for
> short-running programs. Or, alternatively, can I use perf record --stat and
> get an exact count for each performance counter?
Yes perf record --stat should work. 'perf report -T --stdio' is supposed to
print this, but it has regressed i think.
Arnaldo, any ideas?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 15:14 perf-stat per thread results Wim Heirman
2011-05-13 15:30 ` Ingo Molnar
@ 2011-05-13 15:35 ` David Ahern
2011-05-13 15:44 ` Ingo Molnar
2011-05-13 20:11 ` Juri Lelli
2 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2011-05-13 15:35 UTC (permalink / raw)
To: Wim Heirman; +Cc: linux-kernel
On 05/13/11 09:14, Wim Heirman wrote:
> Hi all,
>
> I'm using perf-stat to look at hardware performance counters for a
> parallel program. Is there a way to get counter values for each thread
> individually, rather than aggregated for the whole process? I know I
> can attach to a specific thread using --tid=, but due to the time
> required to find the tid and attach/detach this isn't accurate for
> short-running programs.
perf-stat requires changes to dump counters per thread; it currently
sums all threads into a single value.
> Or, alternatively, can I use perf record --stat and get an exact count
> for each performance counter?
perf-record does not read values from hardware counters.
David
>
> Thanks,
> Wim Heirman
> Ghent University, Belgium
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 15:30 ` Ingo Molnar
@ 2011-05-13 15:41 ` David Ahern
0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2011-05-13 15:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Wim Heirman, linux-kernel, Peter Zijlstra,
Arnaldo Carvalho de Melo, Thomas Gleixner,
Frédéric Weisbecker
On 05/13/11 09:30, Ingo Molnar wrote:
> Yes perf record --stat should work. 'perf report -T --stdio' is supposed to
> print this, but it has regressed i think.
There is a known regression in 2.6.39; perf-record and perf-top cannot
profile all threads in a process. They fail with:
Fatal: failed to mmap with 22 (Invalid argument)
Curiously, perf-stat does work - or at least does not fail with the mmap
error.
David
>
> Arnaldo, any ideas?
>
> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 15:35 ` David Ahern
@ 2011-05-13 15:44 ` Ingo Molnar
2011-05-13 20:32 ` Wim Heirman
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2011-05-13 15:44 UTC (permalink / raw)
To: David Ahern
Cc: Wim Heirman, linux-kernel, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
Brice Goglin
* David Ahern <dsahern@gmail.com> wrote:
> On 05/13/11 09:14, Wim Heirman wrote:
> > Hi all,
> >
> > I'm using perf-stat to look at hardware performance counters for a
> > parallel program. Is there a way to get counter values for each thread
> > individually, rather than aggregated for the whole process? I know I
> > can attach to a specific thread using --tid=, but due to the time
> > required to find the tid and attach/detach this isn't accurate for
> > short-running programs.
>
> perf-stat requires changes to dump counters per thread; it currently
> sums all threads into a single value.
>
> > Or, alternatively, can I use perf record --stat and get an exact count
> > for each performance counter?
>
> perf-record does not read values from hardware counters.
It's supposed to do that if --stat is specified, and it used to work - see this
commit:
8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
and the output there:
# PID TID cache-misses cache-references
4658 4659 495581 3238779
4658 4662 498246 3236823
4658 4663 499531 3243162
which appears to be roughly what Wim is asking for, AFAICT.
But this seems to have regressed meanwhile.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 15:14 perf-stat per thread results Wim Heirman
2011-05-13 15:30 ` Ingo Molnar
2011-05-13 15:35 ` David Ahern
@ 2011-05-13 20:11 ` Juri Lelli
2 siblings, 0 replies; 15+ messages in thread
From: Juri Lelli @ 2011-05-13 20:11 UTC (permalink / raw)
To: Wim Heirman; +Cc: linux-kernel
Hi,
On 05/13/2011 05:14 PM, Wim Heirman wrote:
> Hi all,
>
> I'm using perf-stat to look at hardware performance counters for a
> parallel program. Is there a way to get counter values for each thread
> individually, rather than aggregated for the whole process? I know I
> can attach to a specific thread using --tid=, but due to the time
> required to find the tid and attach/detach this isn't accurate for
> short-running programs.
> Or, alternatively, can I use perf record --stat and get an exact count
> for each performance counter?
>
What about using PAPI library (http://icl.cs.utk.edu/papi/)?
It can be built on top of the perf subsystem and allows to get counters
values on a pre-thread basis.
I used it for some experiments and works very well.
Cheers,
Juri
--
Juri Lelli
Via G. Moruzzi 1, 56124 Pisa (PI), Italy
Scuola Superiore Sant'Anna
TeCIP, ReTiS Lab
Web Site: http://retis.sssup.it/~jlelli | Skype: jurile2712
-------------------------------------------------------------
Il male minore non esiste: è sempre il preannuncio di un male peggiore.
(Sylos Labini)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 15:44 ` Ingo Molnar
@ 2011-05-13 20:32 ` Wim Heirman
2011-05-13 20:45 ` Arnaldo Carvalho de Melo
2011-05-13 23:02 ` David Ahern
0 siblings, 2 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-13 20:32 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Ahern, linux-kernel, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
Brice Goglin
Hi,
> It's supposed to do that if --stat is specified, and it used to work - see this
> commit:
>
> 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>
> and the output there:
>
> # PID TID cache-misses cache-references
> 4658 4659 495581 3238779
> 4658 4662 498246 3236823
> 4658 4663 499531 3243162
>
> which appears to be roughly what Wim is asking for, AFAICT.
Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
it works, although if I use --pid rather than the -- <command> variant
the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
first thread is missing in both use cases, and I get one column per
processor (which in itself is fine).
Regards,
Wim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 20:32 ` Wim Heirman
@ 2011-05-13 20:45 ` Arnaldo Carvalho de Melo
2011-05-14 12:45 ` Wim Heirman
2011-05-13 23:02 ` David Ahern
1 sibling, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-05-13 20:45 UTC (permalink / raw)
To: Wim Heirman
Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra,
Thomas Gleixner, Frédéric Weisbecker, Brice Goglin
Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu:
> > It's supposed to do that if --stat is specified, and it used to work - see this
> > commit:
> > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
> > and the output there:
> >
> > # PID TID cache-misses cache-references
> > 4658 4659 495581 3238779
> > 4658 4662 498246 3236823
> > 4658 4663 499531 3243162
> > which appears to be roughly what Wim is asking for, AFAICT.
> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
> it works, although if I use --pid rather than the -- <command> variant
> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
> first thread is missing in both use cases, and I get one column per
> processor (which in itself is fine).
Can you try after applying the patches in this message:
http://marc.info/?l=linux-kernel&m=130385067430510&w=2
and report your results?
If it fixes the problems you're experiencing, please provide a:
Tested-by: Wim Heirman <wim@heirman.net>
So that I can add when sending them to Ingo.
Thanks,
- Arnaldo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 20:32 ` Wim Heirman
2011-05-13 20:45 ` Arnaldo Carvalho de Melo
@ 2011-05-13 23:02 ` David Ahern
2011-05-14 12:49 ` Wim Heirman
1 sibling, 1 reply; 15+ messages in thread
From: David Ahern @ 2011-05-13 23:02 UTC (permalink / raw)
To: Wim Heirman, Ingo Molnar, Arnaldo Carvalho de Melo
Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner,
Frédéric Weisbecker, Brice Goglin
On 05/13/11 14:32, Wim Heirman wrote:
> Hi,
>
>> It's supposed to do that if --stat is specified, and it used to work - see this
>> commit:
>>
>> 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>>
>> and the output there:
>>
>> # PID TID cache-misses cache-references
>> 4658 4659 495581 3238779
>> 4658 4662 498246 3236823
>> 4658 4663 499531 3243162
>>
>> which appears to be roughly what Wim is asking for, AFAICT.
>
> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
> it works, although if I use --pid rather than the -- <command> variant
> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
> first thread is missing in both use cases, and I get one column per
> processor (which in itself is fine).
>
> Regards,
> Wim
Hmm.... my mileage varies using latest kernel
(446cc6345d3de6571bdd0840f48aca441488a28d)
$ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
$ /tmp/build-perf/perf report -T -i /tmp/perf.data
# Events: 6 cycles
#
# Overhead Command Shared Object Symbol
# ........ ........ ................. ..........................
#
97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel
2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
# PID TID
ie., I do not get the counter values. Specifying the counter with -e
(e.g., -e branch-misses) does not help -- still no counter output.
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 20:45 ` Arnaldo Carvalho de Melo
@ 2011-05-14 12:45 ` Wim Heirman
2011-05-14 20:24 ` Wim Heirman
0 siblings, 1 reply; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 12:45 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra,
Thomas Gleixner, Frédéric Weisbecker, Brice Goglin
On 13 May 2011 22:45, Arnaldo Carvalho de Melo <acme@redhat.com> wrote:
> Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu:
>> > It's supposed to do that if --stat is specified, and it used to work - see this
>> > commit:
>
>> > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>
>> > and the output there:
>> >
>> > # PID TID cache-misses cache-references
>> > 4658 4659 495581 3238779
>> > 4658 4662 498246 3236823
>> > 4658 4663 499531 3243162
>
>> > which appears to be roughly what Wim is asking for, AFAICT.
>
>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
>> it works, although if I use --pid rather than the -- <command> variant
>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
>> first thread is missing in both use cases, and I get one column per
>> processor (which in itself is fine).
>
> Can you try after applying the patches in this message:
>
> http://marc.info/?l=linux-kernel&m=130385067430510&w=2
>
> and report your results?
Sorry, no improvement.
$ ./perf record -e cycles --stat -- ./fft -p4 -m24 && ./perf report
--thread | tail
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.198 MB perf.data (~52331 samples) ]
# PID TID cpu-clock
954 958 8067423322
954 957 6761317556
954 956 6006327147
$ ls /proc/$(pidof fft)/task
954 956 957 958
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-13 23:02 ` David Ahern
@ 2011-05-14 12:49 ` Wim Heirman
2011-05-14 17:48 ` David Ahern
0 siblings, 1 reply; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 12:49 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
Brice Goglin
On 14 May 2011 01:02, David Ahern <dsahern@gmail.com> wrote:
>
>
> On 05/13/11 14:32, Wim Heirman wrote:
>> Hi,
>>
>>> It's supposed to do that if --stat is specified, and it used to work - see this
>>> commit:
>>>
>>> 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>>>
>>> and the output there:
>>>
>>> # PID TID cache-misses cache-references
>>> 4658 4659 495581 3238779
>>> 4658 4662 498246 3236823
>>> 4658 4663 499531 3243162
>>>
>>> which appears to be roughly what Wim is asking for, AFAICT.
>>
>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
>> it works, although if I use --pid rather than the -- <command> variant
>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
>> first thread is missing in both use cases, and I get one column per
>> processor (which in itself is fine).
>>
>> Regards,
>> Wim
>
> Hmm.... my mileage varies using latest kernel
> (446cc6345d3de6571bdd0840f48aca441488a28d)
>
> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>
> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
> # Events: 6 cycles
> #
> # Overhead Command Shared Object Symbol
> # ........ ........ ................. ..........................
> #
> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel
> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe
>
>
> #
> # (For a higher level overview, try: perf report --sort comm,dso)
> #
> # PID TID
>
>
> ie., I do not get the counter values. Specifying the counter with -e
> (e.g., -e branch-misses) does not help -- still no counter output.
Is rsyslogd multithreaded? (Or at least, do the non-main threads
execute any work during your perf-record measurement) If not, then
what you see is consistent with what I'm getting, i.e. everything but
the main thread is reported.
Regards,
Wim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-14 12:49 ` Wim Heirman
@ 2011-05-14 17:48 ` David Ahern
2011-05-14 18:52 ` Wim Heirman
0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2011-05-14 17:48 UTC (permalink / raw)
To: Wim Heirman
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
Brice Goglin
On 05/14/11 06:49, Wim Heirman wrote:
>> Hmm.... my mileage varies using latest kernel
>> (446cc6345d3de6571bdd0840f48aca441488a28d)
>>
>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>>
>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
>> # Events: 6 cycles
>> #
>> # Overhead Command Shared Object Symbol
>> # ........ ........ ................. ..........................
>> #
>> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel
>> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe
>>
>>
>> #
>> # (For a higher level overview, try: perf report --sort comm,dso)
>> #
>> # PID TID
>>
>>
>> ie., I do not get the counter values. Specifying the counter with -e
>> (e.g., -e branch-misses) does not help -- still no counter output.
>
> Is rsyslogd multithreaded? (Or at least, do the non-main threads
> execute any work during your perf-record measurement) If not, then
> what you see is consistent with what I'm getting, i.e. everything but
> the main thread is reported.
It is multithreaded, but my point is that I do not get counter output at
the end -- the PID/TID table is empty. I do not get counters for single
threaded processes nor for commands run by perf record -- e.g.,
/tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data --
sleep 1
David
>
> Regards,
> Wim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-14 17:48 ` David Ahern
@ 2011-05-14 18:52 ` Wim Heirman
2011-05-14 18:57 ` David Ahern
0 siblings, 1 reply; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 18:52 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
Brice Goglin
2011/5/14 David Ahern <dsahern@gmail.com>:
> On 05/14/11 06:49, Wim Heirman wrote:
>>> Hmm.... my mileage varies using latest kernel
>>> (446cc6345d3de6571bdd0840f48aca441488a28d)
>>>
>>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
>>> ^C[ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>>>
>>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
>>> # Events: 6 cycles
>>> #
>>> # Overhead Command Shared Object Symbol
>>> # ........ ........ ................. ..........................
>>> #
>>> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel
>>> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe
>>>
>>>
>>> #
>>> # (For a higher level overview, try: perf report --sort comm,dso)
>>> #
>>> # PID TID
>>>
>>>
>>> ie., I do not get the counter values. Specifying the counter with -e
>>> (e.g., -e branch-misses) does not help -- still no counter output.
>>
>> Is rsyslogd multithreaded? (Or at least, do the non-main threads
>> execute any work during your perf-record measurement) If not, then
>> what you see is consistent with what I'm getting, i.e. everything but
>> the main thread is reported.
>
> It is multithreaded, but my point is that I do not get counter output at
> the end -- the PID/TID table is empty. I do not get counters for single
> threaded processes nor for commands run by perf record -- e.g.,
> /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data --
> sleep 1
My guess was that none of the threads got scheduled while you did your
perf-record run (rsyslogd usually isn't exactly very CPU intensive).
And the main thread isn't ever reported, at least that's the bug I'm
seeing. Can you try with a compute-intensive, multi-threaded program?
Wim.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-14 18:52 ` Wim Heirman
@ 2011-05-14 18:57 ` David Ahern
0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2011-05-14 18:57 UTC (permalink / raw)
To: Wim Heirman
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
Brice Goglin
On 05/14/11 12:52, Wim Heirman wrote:
> 2011/5/14 David Ahern <dsahern@gmail.com>:
>> On 05/14/11 06:49, Wim Heirman wrote:
>>>> Hmm.... my mileage varies using latest kernel
>>>> (446cc6345d3de6571bdd0840f48aca441488a28d)
>>>>
>>>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
>>>> ^C[ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>>>>
>>>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
>>>> # Events: 6 cycles
>>>> #
>>>> # Overhead Command Shared Object Symbol
>>>> # ........ ........ ................. ..........................
>>>> #
>>>> 97.61% rsyslogd libc-2.13.so [.] __libc_disable_asynccancel
>>>> 2.39% rsyslogd [kernel.kallsyms] [k] native_write_msr_safe
>>>>
>>>>
>>>> #
>>>> # (For a higher level overview, try: perf report --sort comm,dso)
>>>> #
>>>> # PID TID
>>>>
>>>>
>>>> ie., I do not get the counter values. Specifying the counter with -e
>>>> (e.g., -e branch-misses) does not help -- still no counter output.
>>>
>>> Is rsyslogd multithreaded? (Or at least, do the non-main threads
>>> execute any work during your perf-record measurement) If not, then
>>> what you see is consistent with what I'm getting, i.e. everything but
>>> the main thread is reported.
>>
>> It is multithreaded, but my point is that I do not get counter output at
>> the end -- the PID/TID table is empty. I do not get counters for single
>> threaded processes nor for commands run by perf record -- e.g.,
>> /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data --
>> sleep 1
>
> My guess was that none of the threads got scheduled while you did your
> perf-record run (rsyslogd usually isn't exactly very CPU intensive).
> And the main thread isn't ever reported, at least that's the bug I'm
> seeing. Can you try with a compute-intensive, multi-threaded program?
>
> Wim.
The reports show data was collected. I have done a number of simple
examples -- all of which execute at least 1 instruction, but the
counters are not displayed (which they should be per the commit
changelog commit).
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: perf-stat per thread results
2011-05-14 12:45 ` Wim Heirman
@ 2011-05-14 20:24 ` Wim Heirman
0 siblings, 0 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 20:24 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra,
Thomas Gleixner, Frédéric Weisbecker, Brice Goglin
On 14 May 2011 14:45, Wim Heirman <wim@heirman.net> wrote:
> On 13 May 2011 22:45, Arnaldo Carvalho de Melo <acme@redhat.com> wrote:
>> Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu:
>>> > It's supposed to do that if --stat is specified, and it used to work - see this
>>> > commit:
>>
>>> > 8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>>
>>> > and the output there:
>>> >
>>> > # PID TID cache-misses cache-references
>>> > 4658 4659 495581 3238779
>>> > 4658 4662 498246 3236823
>>> > 4658 4663 499531 3243162
>>
>>> > which appears to be roughly what Wim is asking for, AFAICT.
>>
>>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
>>> it works, although if I use --pid rather than the -- <command> variant
>>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
>>> first thread is missing in both use cases, and I get one column per
>>> processor (which in itself is fine).
>>
>> Can you try after applying the patches in this message:
>>
>> http://marc.info/?l=linux-kernel&m=130385067430510&w=2
>>
>> and report your results?
>
> Sorry, no improvement.
>
> $ ./perf record -e cycles --stat -- ./fft -p4 -m24 && ./perf report
> --thread | tail
> [ perf record: Woken up 5 times to write data ]
> [ perf record: Captured and wrote 1.198 MB perf.data (~52331 samples) ]
> # PID TID cpu-clock
> 954 958 8067423322
> 954 957 6761317556
> 954 956 6006327147
>
> $ ls /proc/$(pidof fft)/task
> 954 956 957 958
Looks like perf-report --thread is reading PERF_RECORD_READ events
from perf.data. But these are only emitted by the kernel for child
threads: in kernel/events/core.c, the only call to
perf_event_read_event() is in sync_child_event().
Should perf-record then be adapted to do something like perf-stat does
and use __perf_evsel__read to read the parent counter's final values
and add them to perf.data ? That way perf-report can subtract all
children from the final value and get the main thread's counter
values.
Regards,
Wim
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-05-14 20:24 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-13 15:14 perf-stat per thread results Wim Heirman
2011-05-13 15:30 ` Ingo Molnar
2011-05-13 15:41 ` David Ahern
2011-05-13 15:35 ` David Ahern
2011-05-13 15:44 ` Ingo Molnar
2011-05-13 20:32 ` Wim Heirman
2011-05-13 20:45 ` Arnaldo Carvalho de Melo
2011-05-14 12:45 ` Wim Heirman
2011-05-14 20:24 ` Wim Heirman
2011-05-13 23:02 ` David Ahern
2011-05-14 12:49 ` Wim Heirman
2011-05-14 17:48 ` David Ahern
2011-05-14 18:52 ` Wim Heirman
2011-05-14 18:57 ` David Ahern
2011-05-13 20:11 ` Juri Lelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).