Reordering the thread output in perf trace --summary

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Reordering the thread output in perf trace --summary
@ 2016-05-04  9:02 Milian Wolff
  2016-05-04  9:51 ` Milian Wolff
  0 siblings, 1 reply; 8+ messages in thread
From: Milian Wolff @ 2016-05-04  9:02 UTC (permalink / raw)
  To: linux-perf-users@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2409 bytes --]

Hey all,

when using `perf trace --summary` on a (badly designed) user application that 
creates tons of threads, the usually interesting overall summary is drowned by 
the per-thread summary output. I.e.:

perf trace --summary lab_mandelbrot_concurrent |& grep events
 lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec
 QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
 QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
 Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
 Thread (pooled) (19501), 114 events, 0.3%, 0.000 msec
 lab_mandelbrot_ (19502), 88 events, 0.2%, 0.000 msec
 Thread (pooled) (19503), 106 events, 0.3%, 0.000 msec
 Thread (pooled) (19504), 101 events, 0.3%, 0.000 msec
 Thread (pooled) (19505), 102 events, 0.3%, 0.000 msec
... continued for a total of 163 lines

usually, I forget to pipe the output of `perf trace --summary` into a file and 
then have to rerun the command, as the total output (2643 lines!) easily 
exceeds my scrollback buffer.

I would like to propose to reorder the output to sort the output in ascending 
total event order, such that the most interesting output is shown at the 
bottom of the output on the CLI. I.e. in the output above it should be 
something like

perf trace --summary lab_mandelbrot_concurrent |& grep events
... continued for a total of 163 lines
 lab_mandelbrot_ (19502), 88 events, 0.2%, 0.000 msec
 Thread (pooled) (19501), 114 events, 0.3%, 0.000 msec
 Thread (pooled) (19503), 106 events, 0.3%, 0.000 msec
 Thread (pooled) (19504), 101 events, 0.3%, 0.000 msec
 Thread (pooled) (19505), 102 events, 0.3%, 0.000 msec
 QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
 QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
 Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
 lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec

If this is acceptable to you, can someone please tell me how to do such a 
seemingly simple task in C? In C++ I'd except to add a simple std::sort 
somewhere, but in perf's C...? My current idea would be to run 
machine__for_each_thread and store the even count + thread pointer in another 
temporary buffer, which I then qsort and finally iterate over. Does that sound 
OK, or how would you approach this task?

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Reordering the thread output in perf trace --summary
  2016-05-04  9:02 Reordering the thread output in perf trace --summary Milian Wolff
@ 2016-05-04  9:51 ` Milian Wolff
  2016-05-04 21:41   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Milian Wolff @ 2016-05-04  9:51 UTC (permalink / raw)
  To: linux-perf-users@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 6376 bytes --]

On Wednesday, May 4, 2016 11:02:12 AM CEST Milian Wolff wrote:
> Hey all,
> 
> when using `perf trace --summary` on a (badly designed) user application
> that creates tons of threads, the usually interesting overall summary is
> drowned by the per-thread summary output. I.e.:
> 
> perf trace --summary lab_mandelbrot_concurrent |& grep events
>  lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec
>  QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
>  QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
>  Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
>  Thread (pooled) (19501), 114 events, 0.3%, 0.000 msec
>  lab_mandelbrot_ (19502), 88 events, 0.2%, 0.000 msec
>  Thread (pooled) (19503), 106 events, 0.3%, 0.000 msec
>  Thread (pooled) (19504), 101 events, 0.3%, 0.000 msec
>  Thread (pooled) (19505), 102 events, 0.3%, 0.000 msec
> ... continued for a total of 163 lines
> 
> usually, I forget to pipe the output of `perf trace --summary` into a file
> and then have to rerun the command, as the total output (2643 lines!)
> easily exceeds my scrollback buffer.
> 
> I would like to propose to reorder the output to sort the output in
> ascending total event order, such that the most interesting output is shown
> at the bottom of the output on the CLI. I.e. in the output above it should
> be something like
> 
> perf trace --summary lab_mandelbrot_concurrent |& grep events
> ... continued for a total of 163 lines
>  lab_mandelbrot_ (19502), 88 events, 0.2%, 0.000 msec
>  Thread (pooled) (19501), 114 events, 0.3%, 0.000 msec
>  Thread (pooled) (19503), 106 events, 0.3%, 0.000 msec
>  Thread (pooled) (19504), 101 events, 0.3%, 0.000 msec
>  Thread (pooled) (19505), 102 events, 0.3%, 0.000 msec
>  QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
>  QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
>  Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
>  lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec
> 
> If this is acceptable to you, can someone please tell me how to do such a
> seemingly simple task in C? In C++ I'd except to add a simple std::sort
> somewhere, but in perf's C...? My current idea would be to run
> machine__for_each_thread and store the even count + thread pointer in
> another temporary buffer, which I then qsort and finally iterate over. Does
> that sound OK, or how would you approach this task?

While at it, can we similarly reorder the output of the per-thread syscall 
list? At the moment it is e.g.:

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   read                 166     0.332     0.001     0.002     0.031     10.22%
   write                 13     0.038     0.002     0.003     0.006     12.41%
   open                 448     1.189     0.001     0.003     0.020      1.94%
   close                185     0.270     0.001     0.001     0.022      7.78%
   stat                 507     0.823     0.001     0.002     0.009      2.34%
   fstat                215     0.211     0.001     0.001     0.001      1.00%
   lstat                317     0.469     0.001     0.001     0.003      1.42%
   poll                 176     0.534     0.001     0.003     0.169     32.22%
   lseek                  1     0.001     0.001     0.001     0.001      0.00%
   mmap                 384     1.184     0.002     0.003     0.006      1.20%
   mprotect             238     0.949     0.001     0.004     0.013      1.96%
   munmap                42     0.501     0.002     0.012     0.107     27.58%
   brk                   12     0.042     0.001     0.004     0.013     26.16%
   rt_sigaction           2     0.002     0.001     0.001     0.001     12.90%
   rt_sigprocmask         1     0.001     0.001     0.001     0.001      0.00%
   writev               165     0.387     0.002     0.002     0.005      1.57%
   access               156     0.250     0.001     0.002     0.011      4.88%
   socket                 2     0.012     0.005     0.006     0.007     12.05%
   connect                2     0.014     0.005     0.007     0.009     25.14%
   recvfrom               4     0.014     0.002     0.003     0.008     45.24%
   recvmsg               16     0.029     0.001     0.002     0.004     12.30%
   shutdown               1     0.004     0.004     0.004     0.004      0.00%
   getsockname            1     0.001     0.001     0.001     0.001      0.00%
   getpeername            1     0.002     0.002     0.002     0.002      0.00%
   getsockopt             1     0.002     0.002     0.002     0.002      0.00%
   clone                 34     7.506     0.207     0.221     0.295      1.49%
   uname                  2     0.003     0.001     0.001     0.001      4.12%
   fcntl                 32     0.032     0.001     0.001     0.001      2.53%
   getdents              16     0.057     0.001     0.004     0.007     15.32%
   readlink              11     0.020     0.001     0.002     0.005     19.02%
   getrlimit              1     0.001     0.001     0.001     0.001      0.00%
   getuid                 2     0.002     0.001     0.001     0.001     15.24%
   getgid                 1     0.001     0.001     0.001     0.001      0.00%
   geteuid                2     0.002     0.001     0.001     0.001     23.66%
   getegid                1     0.001     0.001     0.001     0.001      0.00%
   statfs                 8     0.020     0.002     0.002     0.004     10.60%
   arch_prctl             1     0.001     0.001     0.001     0.001      0.00%
   futex                489  1466.240     0.001     2.998  1447.978     98.75%
   set_tid_address        1     0.001     0.001     0.001     0.001      0.00%
   clock_getres           1     0.001     0.001     0.001     0.001      0.00%
   set_robust_list        1     0.001     0.001     0.001     0.001      0.00%

This output is not sorted by syscall name, nor by number of calls or total or 
anything... Could we maybe sort it by total msecs by default? Or maybe by 
syscall name and then offer the user a way to sort it by calls/total msecs 
instead?

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Reordering the thread output in perf trace --summary
  2016-05-04  9:51 ` Milian Wolff
@ 2016-05-04 21:41   ` Arnaldo Carvalho de Melo
  2016-05-05 16:04     ` [DONE] " Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-05-04 21:41 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users@vger.kernel.org

Em Wed, May 04, 2016 at 11:51:04AM +0200, Milian Wolff escreveu:
> On Wednesday, May 4, 2016 11:02:12 AM CEST Milian Wolff wrote:
> > I would like to propose to reorder the output to sort the output in
> > ascending total event order, such that the most interesting output is shown
> > at the bottom of the output on the CLI. I.e. in the output above it should
> > be something like

> > perf trace --summary lab_mandelbrot_concurrent |& grep events
> > ... continued for a total of 163 lines
> >  lab_mandelbrot_ (19502), 88 events, 0.2%, 0.000 msec
> >  Thread (pooled) (19501), 114 events, 0.3%, 0.000 msec
> >  Thread (pooled) (19503), 106 events, 0.3%, 0.000 msec
> >  Thread (pooled) (19504), 101 events, 0.3%, 0.000 msec
> >  Thread (pooled) (19505), 102 events, 0.3%, 0.000 msec
> >  QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
> >  QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
> >  Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
> >  lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec

> > If this is acceptable to you, can someone please tell me how to do such a
> > seemingly simple task in C? In C++ I'd except to add a simple std::sort
> > somewhere, but in perf's C...? My current idea would be to run
> > machine__for_each_thread and store the even count + thread pointer in
> > another temporary buffer, which I then qsort and finally iterate over. Does
> > that sound OK, or how would you approach this task?
 
> While at it, can we similarly reorder the output of the per-thread syscall 
> list? At the moment it is e.g.:

Take a look at my perf/core branch, I have it working there.

I'm in the process of experimenting with creating some kinde of template
for resorting rb_trees, that will reduce the boilerplace while keeping
it following the principles described in Documentation/rbtree.txt.

Using it:

# trace -a -s sleep 1
<SNIP>
 gnome-shell (2231), 148 events, 10.3%, 0.000 msec

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   poll                  14     8.138     0.000     0.581     8.012     98.33%
   ioctl                 17     0.096     0.001     0.006     0.054     54.34%
   recvmsg               30     0.070     0.001     0.002     0.005      7.87%
   writev                 6     0.032     0.004     0.005     0.006      5.43%
   read                   4     0.010     0.002     0.003     0.003      9.83%
   write                  3     0.006     0.002     0.002     0.002     13.11%


 Xorg (1965), 150 events, 10.4%, 0.000 msec

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   select                11   377.791     0.000    34.345   267.619     72.83%
   writev                12     0.064     0.002     0.005     0.010     12.94%
   ioctl                  3     0.059     0.005     0.020     0.041     55.30%
   recvmsg               18     0.050     0.001     0.003     0.005     10.72%
   setitimer             18     0.032     0.001     0.002     0.004     10.40%
   rt_sigprocmask        10     0.014     0.001     0.001     0.004     20.81%
   poll                   2     0.004     0.001     0.002     0.003     47.14%
   read                   1     0.003     0.003     0.003     0.003      0.00%


 qemu-system-x86 (10021), 272 events, 18.8%, 0.000 msec

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   poll                 102   989.336     0.000     9.699    30.118     14.38%
   read                  34     0.200     0.003     0.006     0.014      7.01%


 qemu-system-x86 (9931), 464 events, 32.2%, 0.000 msec

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   ppoll                 96   982.288     0.000    10.232    30.035     12.59%
   write                 34     0.368     0.003     0.011     0.026      5.80%
   ioctl                102     0.290     0.001     0.003     0.010      4.74%


[root@jouet ~]# 

Gotta check why the total time per thread is zeroed tho...

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [DONE] Re: Reordering the thread output in perf trace --summary
  2016-05-04 21:41   ` Arnaldo Carvalho de Melo
@ 2016-05-05 16:04     ` Arnaldo Carvalho de Melo
  2016-05-09  8:28       ` Milian Wolff
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-05-05 16:04 UTC (permalink / raw)
  To: Milian Wolff; +Cc: David Ahern, linux-perf-users

Em Wed, May 04, 2016 at 06:41:23PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, May 04, 2016 at 11:51:04AM +0200, Milian Wolff escreveu:
> > On Wednesday, May 4, 2016 11:02:12 AM CEST Milian Wolff wrote:
> > > I would like to propose to reorder the output to sort the output in
> > > ascending total event order, such that the most interesting output is shown
> > > at the bottom of the output on the CLI. I.e. in the output above it should
> > > be something like
 
> > > perf trace --summary lab_mandelbrot_concurrent |& grep events
> > > ... continued for a total of 168 lines
> > >  QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
> > >  QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
> > >  Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
> > >  lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec
 
> > > If this is acceptable to you, can someone please tell me how to do such a
> > > seemingly simple task in C? In C++ I'd except to add a simple std::sort
> > > somewhere, but in perf's C...? My current idea would be to run
> > > machine__for_each_thread and store the even count + thread pointer in
> > > another temporary buffer, which I then qsort and finally iterate over. Does
> > > that sound OK, or how would you approach this task?
  
> > While at it, can we similarly reorder the output of the per-thread syscall 
> > list? At the moment it is e.g.:
 
> Take a look at my perf/core branch, I have it working there.
 
> I'm in the process of experimenting with creating some kinde of template
> for resorting rb_trees, that will reduce the boilerplace while keeping
> it following the principles described in Documentation/rbtree.txt.

Ok, done, got really small and easy to change the keys if we want to,
not dynamicly tho as-is now, but should be easy, with offsetof 8-)

Anyway, I'm satisfied and pushed to perf/core, now looking at why total
thread time is zeroed...

Please take a look and check if it works for you,

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [DONE] Re: Reordering the thread output in perf trace --summary
  2016-05-05 16:04     ` [DONE] " Arnaldo Carvalho de Melo
@ 2016-05-09  8:28       ` Milian Wolff
  2016-05-09 16:25         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Milian Wolff @ 2016-05-09  8:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: David Ahern, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2541 bytes --]

On Thursday, May 5, 2016 1:04:02 PM CEST Arnaldo Carvalho de Melo wrote:
> Em Wed, May 04, 2016 at 06:41:23PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, May 04, 2016 at 11:51:04AM +0200, Milian Wolff escreveu:
> > > On Wednesday, May 4, 2016 11:02:12 AM CEST Milian Wolff wrote:
> > > > I would like to propose to reorder the output to sort the output in
> > > > ascending total event order, such that the most interesting output is
> > > > shown
> > > > at the bottom of the output on the CLI. I.e. in the output above it
> > > > should
> > > > be something like
> > > > 
> > > > perf trace --summary lab_mandelbrot_concurrent |& grep events
> > > > ... continued for a total of 168 lines
> > > > 
> > > >  QDBusConnection (19499), 132 events, 0.4%, 0.000 msec
> > > >  QXcbEventReader (19498), 1094 events, 3.0%, 0.000 msec
> > > >  Thread (pooled) (19500), 1982 events, 5.5%, 0.000 msec
> > > >  lab_mandelbrot_ (19497), 9246 events, 25.7%, 0.000 msec
> > > > 
> > > > If this is acceptable to you, can someone please tell me how to do
> > > > such a
> > > > seemingly simple task in C? In C++ I'd except to add a simple
> > > > std::sort
> > > > somewhere, but in perf's C...? My current idea would be to run
> > > > machine__for_each_thread and store the even count + thread pointer in
> > > > another temporary buffer, which I then qsort and finally iterate over.
> > > > Does
> > > > that sound OK, or how would you approach this task?
> > > 
> > > While at it, can we similarly reorder the output of the per-thread
> > > syscall
> > 
> > > list? At the moment it is e.g.:
> > Take a look at my perf/core branch, I have it working there.
> > 
> > I'm in the process of experimenting with creating some kinde of template
> > for resorting rb_trees, that will reduce the boilerplace while keeping
> > it following the principles described in Documentation/rbtree.txt.
> 
> Ok, done, got really small and easy to change the keys if we want to,
> not dynamicly tho as-is now, but should be easy, with offsetof 8-)
> 
> Anyway, I'm satisfied and pushed to perf/core, now looking at why total
> thread time is zeroed...
> 
> Please take a look and check if it works for you,

Great Arnaldo, thanks a lot! A pleasant surprise to come home from a sunny 
weekend and see this gem waiting for me :)

I played around with it, and it does work as advertised. Great work!

Cheers
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [DONE] Re: Reordering the thread output in perf trace --summary
  2016-05-09  8:28       ` Milian Wolff
@ 2016-05-09 16:25         ` Arnaldo Carvalho de Melo
  2016-05-09 18:03           ` Milian Wolff
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-05-09 16:25 UTC (permalink / raw)
  To: Milian Wolff; +Cc: David Ahern, linux-perf-users

Em Mon, May 09, 2016 at 10:28:01AM +0200, Milian Wolff escreveu:
> On Thursday, May 5, 2016 1:04:02 PM CEST Arnaldo Carvalho de Melo wrote:
> > Em Wed, May 04, 2016 at 06:41:23PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Wed, May 04, 2016 at 11:51:04AM +0200, Milian Wolff escreveu:
> > > > On Wednesday, May 4, 2016 11:02:12 AM CEST Milian Wolff wrote:
> > > > While at it, can we similarly reorder the output of the per-thread
> > > > syscall

> > > > list? At the moment it is e.g.:
> > > Take a look at my perf/core branch, I have it working there.

> > > I'm in the process of experimenting with creating some kinde of template
> > > for resorting rb_trees, that will reduce the boilerplace while keeping
> > > it following the principles described in Documentation/rbtree.txt.

> > Ok, done, got really small and easy to change the keys if we want to,
> > not dynamicly tho as-is now, but should be easy, with offsetof 8-)

> > Anyway, I'm satisfied and pushed to perf/core, now looking at why total
> > thread time is zeroed...

> > Please take a look and check if it works for you,
 
> Great Arnaldo, thanks a lot! A pleasant surprise to come home from a sunny 
> weekend and see this gem waiting for me :)
> 
> I played around with it, and it does work as advertised. Great work!

Glad you liked it :-)

I'll probably make it use the sched:sched_stat_runtime data as the sort
key for threads if --stat is used, what do you think?

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [DONE] Re: Reordering the thread output in perf trace --summary
  2016-05-09 16:25         ` Arnaldo Carvalho de Melo
@ 2016-05-09 18:03           ` Milian Wolff
  2016-05-09 20:12             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Milian Wolff @ 2016-05-09 18:03 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: David Ahern, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2517 bytes --]

On Monday, May 9, 2016 1:25:16 PM CEST Arnaldo Carvalho de Melo wrote:
> Em Mon, May 09, 2016 at 10:28:01AM +0200, Milian Wolff escreveu:
> > On Thursday, May 5, 2016 1:04:02 PM CEST Arnaldo Carvalho de Melo wrote:
> > > Em Wed, May 04, 2016 at 06:41:23PM -0300, Arnaldo Carvalho de Melo 
escreveu:
> > > > Em Wed, May 04, 2016 at 11:51:04AM +0200, Milian Wolff escreveu:
> > > > > On Wednesday, May 4, 2016 11:02:12 AM CEST Milian Wolff wrote:
> > > > > While at it, can we similarly reorder the output of the per-thread
> > > > > syscall
> > > > 
> > > > > list? At the moment it is e.g.:
> > > > Take a look at my perf/core branch, I have it working there.
> > > > 
> > > > I'm in the process of experimenting with creating some kinde of
> > > > template
> > > > for resorting rb_trees, that will reduce the boilerplace while keeping
> > > > it following the principles described in Documentation/rbtree.txt.
> > > 
> > > Ok, done, got really small and easy to change the keys if we want to,
> > > not dynamicly tho as-is now, but should be easy, with offsetof 8-)
> > > 
> > > Anyway, I'm satisfied and pushed to perf/core, now looking at why total
> > > thread time is zeroed...
> > > 
> > > Please take a look and check if it works for you,
> > 
> > Great Arnaldo, thanks a lot! A pleasant surprise to come home from a sunny
> > weekend and see this gem waiting for me :)
> > 
> > I played around with it, and it does work as advertised. Great work!
> 
> Glad you liked it :-)
> 
> I'll probably make it use the sched:sched_stat_runtime data as the sort
> key for threads if --stat is used, what do you think?

You mean if `--sched` was used? I'm undecided on this. On one hand, the user 
explicitly requests `--sched` so he probably is interested in it. On the other 
hand, the total number of syscalls or wait time per thread may be more 
interesting... If you use this to find contention issues e.g. then the threads 
suffering most from contention issues will have a low runtime.

Personally, I think this is yet another situation where a proper GUI could 
solve this problem nicely. A trivial tree view with the option to adapt the 
sorting and aggregation as needed and a way to filter out certain syscalls 
post-collection would be really nice to have. I'm very much looking forward to 
having more time at hands again to finally tackle this.

Bye
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [DONE] Re: Reordering the thread output in perf trace --summary
  2016-05-09 18:03           ` Milian Wolff
@ 2016-05-09 20:12             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-05-09 20:12 UTC (permalink / raw)
  To: Milian Wolff; +Cc: David Ahern, linux-perf-users

Em Mon, May 09, 2016 at 08:03:30PM +0200, Milian Wolff escreveu:
> On Monday, May 9, 2016 1:25:16 PM CEST Arnaldo Carvalho de Melo wrote:
> > I'll probably make it use the sched:sched_stat_runtime data as the sort
> > key for threads if --stat is used, what do you think?

> You mean if `--sched` was used? I'm undecided on this. On one hand, the user 
> explicitly requests `--sched` so he probably is interested in it. On the other 
> hand, the total number of syscalls or wait time per thread may be more 
> interesting... If you use this to find contention issues e.g. then the threads 
> suffering most from contention issues will have a low runtime.

> Personally, I think this is yet another situation where a proper GUI could 
> solve this problem nicely. A trivial tree view with the option to adapt the 
> sorting and aggregation as needed and a way to filter out certain syscalls 
> post-collection would be really nice to have. I'm very much looking forward to 
> having more time at hands again to finally tackle this.

Yeah, we could make that dynamic, the trace case is basically 'perf
top/report' working on two events at a a time (enter/exit).

In the end I think the best way would be to get 'perf trace' to use the
hists browser like top and report :-)

I.e. we would be showing that 'perf trace --summary' in "real time", to
abuse that term a bit more :-)

We would be producing it and refreshing it over time.

A generic mechanism for matching up pairs (or more) events and present
them together with things like callchains, like we do now specifically
for sys_enter/sys_exit syscall tracepoints would be handy indeed.

But feel encouraged to try a GUI to exercise your ideas, hopefully the
existing infrastructure can get readily used for that, let us know about
any change that you think would help with that.

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-09 20:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-04  9:02 Reordering the thread output in perf trace --summary Milian Wolff
2016-05-04  9:51 ` Milian Wolff
2016-05-04 21:41   ` Arnaldo Carvalho de Melo
2016-05-05 16:04     ` [DONE] " Arnaldo Carvalho de Melo
2016-05-09  8:28       ` Milian Wolff
2016-05-09 16:25         ` Arnaldo Carvalho de Melo
2016-05-09 18:03           ` Milian Wolff
2016-05-09 20:12             ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).