linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf-stat per thread results
@ 2011-05-13 15:14 Wim Heirman
  2011-05-13 15:30 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-13 15:14 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I'm using perf-stat to look at hardware performance counters for a
parallel program. Is there a way to get counter values for each thread
individually, rather than aggregated for the whole process? I know I
can attach to a specific thread using --tid=, but due to the time
required to find the tid and attach/detach this isn't accurate for
short-running programs.
Or, alternatively, can I use perf record --stat and get an exact count
for each performance counter?

Thanks,
Wim Heirman
Ghent University, Belgium

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 15:14 perf-stat per thread results Wim Heirman
@ 2011-05-13 15:30 ` Ingo Molnar
  2011-05-13 15:41   ` David Ahern
  2011-05-13 15:35 ` David Ahern
  2011-05-13 20:11 ` Juri Lelli
  2 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2011-05-13 15:30 UTC (permalink / raw)
  To: Wim Heirman
  Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Frédéric Weisbecker


* Wim Heirman <wim@heirman.net> wrote:

> Hi all,
> 
> I'm using perf-stat to look at hardware performance counters for a
> parallel program. Is there a way to get counter values for each thread
> individually, rather than aggregated for the whole process? [...]

Not at the moment, but it would be a useful feature.

> [...] I know I can attach to a specific thread using --tid=, but due to the 
> time required to find the tid and attach/detach this isn't accurate for 
> short-running programs. Or, alternatively, can I use perf record --stat and 
> get an exact count for each performance counter?

Yes perf record --stat should work. 'perf report -T --stdio' is supposed to 
print this, but it has regressed i think.

Arnaldo, any ideas?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 15:14 perf-stat per thread results Wim Heirman
  2011-05-13 15:30 ` Ingo Molnar
@ 2011-05-13 15:35 ` David Ahern
  2011-05-13 15:44   ` Ingo Molnar
  2011-05-13 20:11 ` Juri Lelli
  2 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2011-05-13 15:35 UTC (permalink / raw)
  To: Wim Heirman; +Cc: linux-kernel



On 05/13/11 09:14, Wim Heirman wrote:
> Hi all,
> 
> I'm using perf-stat to look at hardware performance counters for a
> parallel program. Is there a way to get counter values for each thread
> individually, rather than aggregated for the whole process? I know I
> can attach to a specific thread using --tid=, but due to the time
> required to find the tid and attach/detach this isn't accurate for
> short-running programs.

perf-stat requires changes to dump counters per thread; it currently
sums all threads into a single value.

> Or, alternatively, can I use perf record --stat and get an exact count
> for each performance counter?

perf-record does not read values from hardware counters.

David



> 
> Thanks,
> Wim Heirman
> Ghent University, Belgium
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 15:30 ` Ingo Molnar
@ 2011-05-13 15:41   ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2011-05-13 15:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Wim Heirman, linux-kernel, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Thomas Gleixner,
	Frédéric Weisbecker

On 05/13/11 09:30, Ingo Molnar wrote:

> Yes perf record --stat should work. 'perf report -T --stdio' is supposed to 
> print this, but it has regressed i think.

There is a known regression in 2.6.39; perf-record and perf-top cannot
profile all threads in a process. They fail with:

  Fatal: failed to mmap with 22 (Invalid argument)

Curiously, perf-stat does work - or at least does not fail with the mmap
error.

David


> 
> Arnaldo, any ideas?
> 
> Thanks,
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 15:35 ` David Ahern
@ 2011-05-13 15:44   ` Ingo Molnar
  2011-05-13 20:32     ` Wim Heirman
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2011-05-13 15:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Wim Heirman, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
	Brice Goglin


* David Ahern <dsahern@gmail.com> wrote:

> On 05/13/11 09:14, Wim Heirman wrote:
> > Hi all,
> > 
> > I'm using perf-stat to look at hardware performance counters for a
> > parallel program. Is there a way to get counter values for each thread
> > individually, rather than aggregated for the whole process? I know I
> > can attach to a specific thread using --tid=, but due to the time
> > required to find the tid and attach/detach this isn't accurate for
> > short-running programs.
> 
> perf-stat requires changes to dump counters per thread; it currently
> sums all threads into a single value.
> 
> > Or, alternatively, can I use perf record --stat and get an exact count
> > for each performance counter?
> 
> perf-record does not read values from hardware counters.

It's supposed to do that if --stat is specified, and it used to work - see this 
commit:

  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters

and the output there:

     #  PID   TID  cache-misses  cache-references
       4658  4659        495581           3238779
       4658  4662        498246           3236823
       4658  4663        499531           3243162

which appears to be roughly what Wim is asking for, AFAICT.

But this seems to have regressed meanwhile.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 15:14 perf-stat per thread results Wim Heirman
  2011-05-13 15:30 ` Ingo Molnar
  2011-05-13 15:35 ` David Ahern
@ 2011-05-13 20:11 ` Juri Lelli
  2 siblings, 0 replies; 15+ messages in thread
From: Juri Lelli @ 2011-05-13 20:11 UTC (permalink / raw)
  To: Wim Heirman; +Cc: linux-kernel

Hi,

On 05/13/2011 05:14 PM, Wim Heirman wrote:
> Hi all,
>
> I'm using perf-stat to look at hardware performance counters for a
> parallel program. Is there a way to get counter values for each thread
> individually, rather than aggregated for the whole process? I know I
> can attach to a specific thread using --tid=, but due to the time
> required to find the tid and attach/detach this isn't accurate for
> short-running programs.
> Or, alternatively, can I use perf record --stat and get an exact count
> for each performance counter?
>

What about using PAPI library (http://icl.cs.utk.edu/papi/)?
It can be built on top of the perf subsystem and allows to get counters 
values on a pre-thread basis.
I used it for some experiments and works very well.

Cheers,
	Juri

-- 
Juri Lelli
Via G. Moruzzi 1, 56124 Pisa (PI), Italy
Scuola Superiore Sant'Anna
TeCIP, ReTiS Lab

Web Site: http://retis.sssup.it/~jlelli | Skype: jurile2712
-------------------------------------------------------------
Il male minore non esiste: è sempre il preannuncio di un male peggiore.
							(Sylos Labini)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 15:44   ` Ingo Molnar
@ 2011-05-13 20:32     ` Wim Heirman
  2011-05-13 20:45       ` Arnaldo Carvalho de Melo
  2011-05-13 23:02       ` David Ahern
  0 siblings, 2 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-13 20:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
	Brice Goglin

Hi,

> It's supposed to do that if --stat is specified, and it used to work - see this
> commit:
>
>  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>
> and the output there:
>
>     #  PID   TID  cache-misses  cache-references
>       4658  4659        495581           3238779
>       4658  4662        498246           3236823
>       4658  4663        499531           3243162
>
> which appears to be roughly what Wim is asking for, AFAICT.

Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
it works, although if I use --pid rather than the -- <command> variant
the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
first thread is missing in both use cases, and I get one column per
processor (which in itself is fine).

Regards,
Wim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 20:32     ` Wim Heirman
@ 2011-05-13 20:45       ` Arnaldo Carvalho de Melo
  2011-05-14 12:45         ` Wim Heirman
  2011-05-13 23:02       ` David Ahern
  1 sibling, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-05-13 20:45 UTC (permalink / raw)
  To: Wim Heirman
  Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra,
	Thomas Gleixner, Frédéric Weisbecker, Brice Goglin

Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu:
> > It's supposed to do that if --stat is specified, and it used to work - see this
> > commit:

> >  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters

> > and the output there:
> >
> >     #  PID   TID  cache-misses  cache-references
> >       4658  4659        495581           3238779
> >       4658  4662        498246           3236823
> >       4658  4663        499531           3243162

> > which appears to be roughly what Wim is asking for, AFAICT.

> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
> it works, although if I use --pid rather than the -- <command> variant
> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
> first thread is missing in both use cases, and I get one column per
> processor (which in itself is fine).

Can you try after applying the patches in this message:

http://marc.info/?l=linux-kernel&m=130385067430510&w=2

and report your results?

If it fixes the problems you're experiencing, please provide a:

Tested-by: Wim Heirman <wim@heirman.net>

So that I can add when sending them to Ingo.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 20:32     ` Wim Heirman
  2011-05-13 20:45       ` Arnaldo Carvalho de Melo
@ 2011-05-13 23:02       ` David Ahern
  2011-05-14 12:49         ` Wim Heirman
  1 sibling, 1 reply; 15+ messages in thread
From: David Ahern @ 2011-05-13 23:02 UTC (permalink / raw)
  To: Wim Heirman, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Frédéric Weisbecker, Brice Goglin



On 05/13/11 14:32, Wim Heirman wrote:
> Hi,
> 
>> It's supposed to do that if --stat is specified, and it used to work - see this
>> commit:
>>
>>  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>>
>> and the output there:
>>
>>     #  PID   TID  cache-misses  cache-references
>>       4658  4659        495581           3238779
>>       4658  4662        498246           3236823
>>       4658  4663        499531           3243162
>>
>> which appears to be roughly what Wim is asking for, AFAICT.
> 
> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
> it works, although if I use --pid rather than the -- <command> variant
> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
> first thread is missing in both use cases, and I get one column per
> processor (which in itself is fine).
> 
> Regards,
> Wim

Hmm.... my mileage varies using latest kernel
(446cc6345d3de6571bdd0840f48aca441488a28d)

$ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]

$ /tmp/build-perf/perf report -T -i /tmp/perf.data
# Events: 6  cycles
#
# Overhead   Command      Shared Object                      Symbol
# ........  ........  .................  ..........................
#
    97.61%  rsyslogd  libc-2.13.so       [.] __libc_disable_asynccancel
     2.39%  rsyslogd  [kernel.kallsyms]  [k] native_write_msr_safe


#
# (For a higher level overview, try: perf report --sort comm,dso)
#
# PID  TID


ie., I do not get the counter values. Specifying the counter with -e
(e.g., -e branch-misses) does not help -- still no counter output.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 20:45       ` Arnaldo Carvalho de Melo
@ 2011-05-14 12:45         ` Wim Heirman
  2011-05-14 20:24           ` Wim Heirman
  0 siblings, 1 reply; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra,
	Thomas Gleixner, Frédéric Weisbecker, Brice Goglin

On 13 May 2011 22:45, Arnaldo Carvalho de Melo <acme@redhat.com> wrote:
> Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu:
>> > It's supposed to do that if --stat is specified, and it used to work - see this
>> > commit:
>
>> >  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>
>> > and the output there:
>> >
>> >     #  PID   TID  cache-misses  cache-references
>> >       4658  4659        495581           3238779
>> >       4658  4662        498246           3236823
>> >       4658  4663        499531           3243162
>
>> > which appears to be roughly what Wim is asking for, AFAICT.
>
>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
>> it works, although if I use --pid rather than the -- <command> variant
>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
>> first thread is missing in both use cases, and I get one column per
>> processor (which in itself is fine).
>
> Can you try after applying the patches in this message:
>
> http://marc.info/?l=linux-kernel&m=130385067430510&w=2
>
> and report your results?

Sorry, no improvement.

$ ./perf record -e cycles --stat -- ./fft -p4 -m24 && ./perf report
--thread | tail
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.198 MB perf.data (~52331 samples) ]
# PID  TID   cpu-clock
  954  958  8067423322
  954  957  6761317556
  954  956  6006327147

$ ls /proc/$(pidof fft)/task
954  956  957  958

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-13 23:02       ` David Ahern
@ 2011-05-14 12:49         ` Wim Heirman
  2011-05-14 17:48           ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 12:49 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
	Brice Goglin

On 14 May 2011 01:02, David Ahern <dsahern@gmail.com> wrote:
>
>
> On 05/13/11 14:32, Wim Heirman wrote:
>> Hi,
>>
>>> It's supposed to do that if --stat is specified, and it used to work - see this
>>> commit:
>>>
>>>  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>>>
>>> and the output there:
>>>
>>>     #  PID   TID  cache-misses  cache-references
>>>       4658  4659        495581           3238779
>>>       4658  4662        498246           3236823
>>>       4658  4663        499531           3243162
>>>
>>> which appears to be roughly what Wim is asking for, AFAICT.
>>
>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
>> it works, although if I use --pid rather than the -- <command> variant
>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
>> first thread is missing in both use cases, and I get one column per
>> processor (which in itself is fine).
>>
>> Regards,
>> Wim
>
> Hmm.... my mileage varies using latest kernel
> (446cc6345d3de6571bdd0840f48aca441488a28d)
>
> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>
> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
> # Events: 6  cycles
> #
> # Overhead   Command      Shared Object                      Symbol
> # ........  ........  .................  ..........................
> #
>    97.61%  rsyslogd  libc-2.13.so       [.] __libc_disable_asynccancel
>     2.39%  rsyslogd  [kernel.kallsyms]  [k] native_write_msr_safe
>
>
> #
> # (For a higher level overview, try: perf report --sort comm,dso)
> #
> # PID  TID
>
>
> ie., I do not get the counter values. Specifying the counter with -e
> (e.g., -e branch-misses) does not help -- still no counter output.

Is rsyslogd multithreaded? (Or at least, do the non-main threads
execute any work during your perf-record measurement) If not, then
what you see is consistent with what I'm getting, i.e. everything but
the main thread is reported.

Regards,
Wim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-14 12:49         ` Wim Heirman
@ 2011-05-14 17:48           ` David Ahern
  2011-05-14 18:52             ` Wim Heirman
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2011-05-14 17:48 UTC (permalink / raw)
  To: Wim Heirman
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
	Brice Goglin

On 05/14/11 06:49, Wim Heirman wrote:
>> Hmm.... my mileage varies using latest kernel
>> (446cc6345d3de6571bdd0840f48aca441488a28d)
>>
>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>>
>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
>> # Events: 6  cycles
>> #
>> # Overhead   Command      Shared Object                      Symbol
>> # ........  ........  .................  ..........................
>> #
>>    97.61%  rsyslogd  libc-2.13.so       [.] __libc_disable_asynccancel
>>     2.39%  rsyslogd  [kernel.kallsyms]  [k] native_write_msr_safe
>>
>>
>> #
>> # (For a higher level overview, try: perf report --sort comm,dso)
>> #
>> # PID  TID
>>
>>
>> ie., I do not get the counter values. Specifying the counter with -e
>> (e.g., -e branch-misses) does not help -- still no counter output.
> 
> Is rsyslogd multithreaded? (Or at least, do the non-main threads
> execute any work during your perf-record measurement) If not, then
> what you see is consistent with what I'm getting, i.e. everything but
> the main thread is reported.

It is multithreaded, but my point is that I do not get counter output at
the end -- the PID/TID table is empty. I do not get counters for single
threaded processes nor for commands run by perf record -- e.g.,
/tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data --
sleep 1

David


> 
> Regards,
> Wim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-14 17:48           ` David Ahern
@ 2011-05-14 18:52             ` Wim Heirman
  2011-05-14 18:57               ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 18:52 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
	Brice Goglin

2011/5/14 David Ahern <dsahern@gmail.com>:
> On 05/14/11 06:49, Wim Heirman wrote:
>>> Hmm.... my mileage varies using latest kernel
>>> (446cc6345d3de6571bdd0840f48aca441488a28d)
>>>
>>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
>>> ^C[ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>>>
>>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
>>> # Events: 6  cycles
>>> #
>>> # Overhead   Command      Shared Object                      Symbol
>>> # ........  ........  .................  ..........................
>>> #
>>>    97.61%  rsyslogd  libc-2.13.so       [.] __libc_disable_asynccancel
>>>     2.39%  rsyslogd  [kernel.kallsyms]  [k] native_write_msr_safe
>>>
>>>
>>> #
>>> # (For a higher level overview, try: perf report --sort comm,dso)
>>> #
>>> # PID  TID
>>>
>>>
>>> ie., I do not get the counter values. Specifying the counter with -e
>>> (e.g., -e branch-misses) does not help -- still no counter output.
>>
>> Is rsyslogd multithreaded? (Or at least, do the non-main threads
>> execute any work during your perf-record measurement) If not, then
>> what you see is consistent with what I'm getting, i.e. everything but
>> the main thread is reported.
>
> It is multithreaded, but my point is that I do not get counter output at
> the end -- the PID/TID table is empty. I do not get counters for single
> threaded processes nor for commands run by perf record -- e.g.,
> /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data --
> sleep 1

My guess was that none of the threads got scheduled while you did your
perf-record run (rsyslogd usually isn't exactly very CPU intensive).
And the main thread isn't ever reported, at least that's the bug I'm
seeing. Can you try with a compute-intensive, multi-threaded program?

Wim.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-14 18:52             ` Wim Heirman
@ 2011-05-14 18:57               ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2011-05-14 18:57 UTC (permalink / raw)
  To: Wim Heirman
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Frédéric Weisbecker,
	Brice Goglin



On 05/14/11 12:52, Wim Heirman wrote:
> 2011/5/14 David Ahern <dsahern@gmail.com>:
>> On 05/14/11 06:49, Wim Heirman wrote:
>>>> Hmm.... my mileage varies using latest kernel
>>>> (446cc6345d3de6571bdd0840f48aca441488a28d)
>>>>
>>>> $ /tmp/build-perf/perf record --stat -fo /tmp/perf.data -p $(pidof rsyslogd)
>>>> ^C[ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (~308 samples) ]
>>>>
>>>> $ /tmp/build-perf/perf report -T -i /tmp/perf.data
>>>> # Events: 6  cycles
>>>> #
>>>> # Overhead   Command      Shared Object                      Symbol
>>>> # ........  ........  .................  ..........................
>>>> #
>>>>    97.61%  rsyslogd  libc-2.13.so       [.] __libc_disable_asynccancel
>>>>     2.39%  rsyslogd  [kernel.kallsyms]  [k] native_write_msr_safe
>>>>
>>>>
>>>> #
>>>> # (For a higher level overview, try: perf report --sort comm,dso)
>>>> #
>>>> # PID  TID
>>>>
>>>>
>>>> ie., I do not get the counter values. Specifying the counter with -e
>>>> (e.g., -e branch-misses) does not help -- still no counter output.
>>>
>>> Is rsyslogd multithreaded? (Or at least, do the non-main threads
>>> execute any work during your perf-record measurement) If not, then
>>> what you see is consistent with what I'm getting, i.e. everything but
>>> the main thread is reported.
>>
>> It is multithreaded, but my point is that I do not get counter output at
>> the end -- the PID/TID table is empty. I do not get counters for single
>> threaded processes nor for commands run by perf record -- e.g.,
>> /tmp/build-perf/perf record --stat -e instructions -fo /tmp/perf.data --
>> sleep 1
> 
> My guess was that none of the threads got scheduled while you did your
> perf-record run (rsyslogd usually isn't exactly very CPU intensive).
> And the main thread isn't ever reported, at least that's the bug I'm
> seeing. Can you try with a compute-intensive, multi-threaded program?
> 
> Wim.

The reports show data was collected. I have done a number of simple
examples -- all of which execute at least 1 instruction, but the
counters are not displayed (which they should be per the commit
changelog commit).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: perf-stat per thread results
  2011-05-14 12:45         ` Wim Heirman
@ 2011-05-14 20:24           ` Wim Heirman
  0 siblings, 0 replies; 15+ messages in thread
From: Wim Heirman @ 2011-05-14 20:24 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, David Ahern, linux-kernel, Peter Zijlstra,
	Thomas Gleixner, Frédéric Weisbecker, Brice Goglin

On 14 May 2011 14:45, Wim Heirman <wim@heirman.net> wrote:
> On 13 May 2011 22:45, Arnaldo Carvalho de Melo <acme@redhat.com> wrote:
>> Em Fri, May 13, 2011 at 10:32:58PM +0200, Wim Heirman escreveu:
>>> > It's supposed to do that if --stat is specified, and it used to work - see this
>>> > commit:
>>
>>> >  8d51327090ac: perf report: Fix and improve the displaying of per-thread event counters
>>
>>> > and the output there:
>>> >
>>> >     #  PID   TID  cache-misses  cache-references
>>> >       4658  4659        495581           3238779
>>> >       4658  4662        498246           3236823
>>> >       4658  4663        499531           3243162
>>
>>> > which appears to be roughly what Wim is asking for, AFAICT.
>>
>>> Thanks, this is exactly what I'm looking for. In 2.6.32 (Ubuntu 10.04)
>>> it works, although if I use --pid rather than the -- <command> variant
>>> the first thread seams to be missing. In 2.6.38 (Ubuntu 11.04) the
>>> first thread is missing in both use cases, and I get one column per
>>> processor (which in itself is fine).
>>
>> Can you try after applying the patches in this message:
>>
>> http://marc.info/?l=linux-kernel&m=130385067430510&w=2
>>
>> and report your results?
>
> Sorry, no improvement.
>
> $ ./perf record -e cycles --stat -- ./fft -p4 -m24 && ./perf report
> --thread | tail
> [ perf record: Woken up 5 times to write data ]
> [ perf record: Captured and wrote 1.198 MB perf.data (~52331 samples) ]
> # PID  TID   cpu-clock
>  954  958  8067423322
>  954  957  6761317556
>  954  956  6006327147
>
> $ ls /proc/$(pidof fft)/task
> 954  956  957  958

Looks like perf-report --thread is reading PERF_RECORD_READ events
from perf.data. But these are only emitted by the kernel for child
threads: in kernel/events/core.c, the only call to
perf_event_read_event() is in sync_child_event().
Should perf-record then be adapted to do something like perf-stat does
and use __perf_evsel__read to read the parent counter's final values
and add them to perf.data ? That way perf-report can subtract all
children from the final value and get the main thread's counter
values.

Regards,
Wim

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-05-14 20:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-13 15:14 perf-stat per thread results Wim Heirman
2011-05-13 15:30 ` Ingo Molnar
2011-05-13 15:41   ` David Ahern
2011-05-13 15:35 ` David Ahern
2011-05-13 15:44   ` Ingo Molnar
2011-05-13 20:32     ` Wim Heirman
2011-05-13 20:45       ` Arnaldo Carvalho de Melo
2011-05-14 12:45         ` Wim Heirman
2011-05-14 20:24           ` Wim Heirman
2011-05-13 23:02       ` David Ahern
2011-05-14 12:49         ` Wim Heirman
2011-05-14 17:48           ` David Ahern
2011-05-14 18:52             ` Wim Heirman
2011-05-14 18:57               ` David Ahern
2011-05-13 20:11 ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).