linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin King <benjaminking@web.de>
To: Milian Wolff <milian.wolff@kdab.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Failure to parallelize
Date: Thu, 18 Aug 2016 20:50:48 +0200	[thread overview]
Message-ID: <20160818185048.GA2242@localhost> (raw)
In-Reply-To: <21975224.qTVBMiFlMz@milian-kdab2>

On Thu, Aug 18, 2016 at 11:56:45AM +0200, Milian Wolff wrote:
>> Is there some way to make the difference [between threaded and non
>> threaded code] more visible in perf?
>
>I think that won't even be detected by perf's way of doing sleep time
>profiling, e.g.:
>
>https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record
>
>Because there is no contention - it's simply Amdahl's law that's tripping you
>up. Having a look at the CPU utilization is very important when writing
>(supposedly) parallel code.

Yes, I should have looked at CPU utilization more closely right away. But
even now that I know this, it seems awkward that I am unable to demonstrate
with my trusty old profiler which part of my program just took longer in terms
of wall clock time. I don't have VTune at my disposal, but it could be that I
am using the wrong tool for the job.

Still, I dabbled a bit with "perf record -s ...; perf report -T", but I find the
output a little confusing. To wit:

-----8< noppy1.c
#include <omp.h>
void foo() { int i; for ( i = 0; i < 900; ++i ) asm("nop;nop;nop;nop;"); }
void bar() { int i; for ( i = 0; i < 100; ++i ) asm("nop;nop;nop;nop;"); }
int main() {
  long i;
#pragma omp parallel for
  for ( i = 0; i < 1000000; ++i ) foo();
  for ( i = 0; i < 1000000; ++i ) bar();
}
-----8< noppy2.c
#include <omp.h>
void foo() { int i; for ( i = 0; i < 900; ++i ) asm("nop;nop;nop;nop;"); }
void bar() { int i; for ( i = 0; i < 100; ++i ) asm("nop;nop;nop;nop;"); }
int main() {
  long i;
#pragma omp parallel for
  for ( i = 0; i < 1000000; ++i ) foo();
#pragma omp parallel for
  for ( i = 0; i < 1000000; ++i ) bar();
}
-----8< gcc noppy1.c -g -fopenmp -o noppy1;perf record -s ./noppy1;perf report -T
    92.15%  noppy1   noppy1              [.] foo
     7.08%  noppy1   noppy1              [.] bar
    ...
#  PID   TID  cycles:pp   cycles:pp  cycles:pp
  3853  3856          0  1492046281          0
  3853  3854          0    57482400          0
  3853  3855          0           0          0
-----8< gcc noppy2.c -g -fopenmp -o noppy2;perf record -s ./noppy2;perf report -T
    88.97%  noppy2   noppy2            [.] foo                   
    10.27%  noppy2   noppy2            [.] bar                   
    ...
#  PID   TID        cycles:pp  cycles:pp  cycles:pp
  3869  3870                0   56778112          0
  3869  3871       2180814133   57030240          0
  3869  3872  139866901929176          0          0
-----8<


So, there is some difference in cycles:pp but I totally don't get what this
table in the end of perf report -T is supposed to mean. The large value for TID
3872 looks broken.


I had more luck with 'perf report --per-thread':

-----8< perf record --per-thread ./noppy1;perf report
  68.90%  noppy1   noppy1            [.] foo
  29.79%  noppy1   noppy1            [.] bar
...
-----8< perf record --per-thread ./noppy2;perf report
  87.18%  noppy2   noppy2            [.] foo
  10.35%  noppy2   noppy2            [.] bar
...
-----8<

So, noppy1 looks different and I can see that the effort between foo and bar
shifted. Adding a '--show-nr-samples' or a '--show-total-period' is then
telling me that the effort for foo() stays the same, but bar() gets more
expensive.

Unfortunately, I still do not understand what exactly '--per-thread' is doing.
The manpage is a little brief and I did not look in the code yet.
But it's a start!

Cheers,
  Benjamin

  reply	other threads:[~2016-08-19  1:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17 13:55 Failure to parallelize Benjamin King
2016-08-18  9:56 ` Milian Wolff
2016-08-18 18:50   ` Benjamin King [this message]
2016-08-22 21:14     ` Andi Kleen
2016-08-23  6:10       ` Benjamin King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160818185048.GA2242@localhost \
    --to=benjaminking@web.de \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=milian.wolff@kdab.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).