All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin King <benjaminking@web.de>
To: Milian Wolff <milian.wolff@kdab.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Failure to parallelize
Date: Thu, 18 Aug 2016 20:50:48 +0200	[thread overview]
Message-ID: <20160818185048.GA2242@localhost> (raw)
In-Reply-To: <21975224.qTVBMiFlMz@milian-kdab2>

On Thu, Aug 18, 2016 at 11:56:45AM +0200, Milian Wolff wrote:
>> Is there some way to make the difference [between threaded and non
>> threaded code] more visible in perf?
>
>I think that won't even be detected by perf's way of doing sleep time
>profiling, e.g.:
>
>https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record
>
>Because there is no contention - it's simply Amdahl's law that's tripping you
>up. Having a look at the CPU utilization is very important when writing
>(supposedly) parallel code.

Yes, I should have looked at CPU utilization more closely right away. But
even now that I know this, it seems awkward that I am unable to demonstrate
with my trusty old profiler which part of my program just took longer in terms
of wall clock time. I don't have VTune at my disposal, but it could be that I
am using the wrong tool for the job.

Still, I dabbled a bit with "perf record -s ...; perf report -T", but I find the
output a little confusing. To wit:

-----8< noppy1.c
#include <omp.h>
void foo() { int i; for ( i = 0; i < 900; ++i ) asm("nop;nop;nop;nop;"); }
void bar() { int i; for ( i = 0; i < 100; ++i ) asm("nop;nop;nop;nop;"); }
int main() {
  long i;
#pragma omp parallel for
  for ( i = 0; i < 1000000; ++i ) foo();
  for ( i = 0; i < 1000000; ++i ) bar();
}
-----8< noppy2.c
#include <omp.h>
void foo() { int i; for ( i = 0; i < 900; ++i ) asm("nop;nop;nop;nop;"); }
void bar() { int i; for ( i = 0; i < 100; ++i ) asm("nop;nop;nop;nop;"); }
int main() {
  long i;
#pragma omp parallel for
  for ( i = 0; i < 1000000; ++i ) foo();
#pragma omp parallel for
  for ( i = 0; i < 1000000; ++i ) bar();
}
-----8< gcc noppy1.c -g -fopenmp -o noppy1;perf record -s ./noppy1;perf report -T
    92.15%  noppy1   noppy1              [.] foo
     7.08%  noppy1   noppy1              [.] bar
    ...
#  PID   TID  cycles:pp   cycles:pp  cycles:pp
  3853  3856          0  1492046281          0
  3853  3854          0    57482400          0
  3853  3855          0           0          0
-----8< gcc noppy2.c -g -fopenmp -o noppy2;perf record -s ./noppy2;perf report -T
    88.97%  noppy2   noppy2            [.] foo                   
    10.27%  noppy2   noppy2            [.] bar                   
    ...
#  PID   TID        cycles:pp  cycles:pp  cycles:pp
  3869  3870                0   56778112          0
  3869  3871       2180814133   57030240          0
  3869  3872  139866901929176          0          0
-----8<


So, there is some difference in cycles:pp but I totally don't get what this
table in the end of perf report -T is supposed to mean. The large value for TID
3872 looks broken.


I had more luck with 'perf report --per-thread':

-----8< perf record --per-thread ./noppy1;perf report
  68.90%  noppy1   noppy1            [.] foo
  29.79%  noppy1   noppy1            [.] bar
...
-----8< perf record --per-thread ./noppy2;perf report
  87.18%  noppy2   noppy2            [.] foo
  10.35%  noppy2   noppy2            [.] bar
...
-----8<

So, noppy1 looks different and I can see that the effort between foo and bar
shifted. Adding a '--show-nr-samples' or a '--show-total-period' is then
telling me that the effort for foo() stays the same, but bar() gets more
expensive.

Unfortunately, I still do not understand what exactly '--per-thread' is doing.
The manpage is a little brief and I did not look in the code yet.
But it's a start!

Cheers,
  Benjamin

  reply	other threads:[~2016-08-19  1:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17 13:55 Failure to parallelize Benjamin King
2016-08-18  9:56 ` Milian Wolff
2016-08-18 18:50   ` Benjamin King [this message]
2016-08-22 21:14     ` Andi Kleen
2016-08-23  6:10       ` Benjamin King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160818185048.GA2242@localhost \
    --to=benjaminking@web.de \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=milian.wolff@kdab.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.