From: Benjamin King <benjaminking@web.de>
To: Milian Wolff <milian.wolff@kdab.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Failure to parallelize
Date: Thu, 18 Aug 2016 20:50:48 +0200 [thread overview]
Message-ID: <20160818185048.GA2242@localhost> (raw)
In-Reply-To: <21975224.qTVBMiFlMz@milian-kdab2>
On Thu, Aug 18, 2016 at 11:56:45AM +0200, Milian Wolff wrote:
>> Is there some way to make the difference [between threaded and non
>> threaded code] more visible in perf?
>
>I think that won't even be detected by perf's way of doing sleep time
>profiling, e.g.:
>
>https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record
>
>Because there is no contention - it's simply Amdahl's law that's tripping you
>up. Having a look at the CPU utilization is very important when writing
>(supposedly) parallel code.
Yes, I should have looked at CPU utilization more closely right away. But
even now that I know this, it seems awkward that I am unable to demonstrate
with my trusty old profiler which part of my program just took longer in terms
of wall clock time. I don't have VTune at my disposal, but it could be that I
am using the wrong tool for the job.
Still, I dabbled a bit with "perf record -s ...; perf report -T", but I find the
output a little confusing. To wit:
-----8< noppy1.c
#include <omp.h>
void foo() { int i; for ( i = 0; i < 900; ++i ) asm("nop;nop;nop;nop;"); }
void bar() { int i; for ( i = 0; i < 100; ++i ) asm("nop;nop;nop;nop;"); }
int main() {
long i;
#pragma omp parallel for
for ( i = 0; i < 1000000; ++i ) foo();
for ( i = 0; i < 1000000; ++i ) bar();
}
-----8< noppy2.c
#include <omp.h>
void foo() { int i; for ( i = 0; i < 900; ++i ) asm("nop;nop;nop;nop;"); }
void bar() { int i; for ( i = 0; i < 100; ++i ) asm("nop;nop;nop;nop;"); }
int main() {
long i;
#pragma omp parallel for
for ( i = 0; i < 1000000; ++i ) foo();
#pragma omp parallel for
for ( i = 0; i < 1000000; ++i ) bar();
}
-----8< gcc noppy1.c -g -fopenmp -o noppy1;perf record -s ./noppy1;perf report -T
92.15% noppy1 noppy1 [.] foo
7.08% noppy1 noppy1 [.] bar
...
# PID TID cycles:pp cycles:pp cycles:pp
3853 3856 0 1492046281 0
3853 3854 0 57482400 0
3853 3855 0 0 0
-----8< gcc noppy2.c -g -fopenmp -o noppy2;perf record -s ./noppy2;perf report -T
88.97% noppy2 noppy2 [.] foo
10.27% noppy2 noppy2 [.] bar
...
# PID TID cycles:pp cycles:pp cycles:pp
3869 3870 0 56778112 0
3869 3871 2180814133 57030240 0
3869 3872 139866901929176 0 0
-----8<
So, there is some difference in cycles:pp but I totally don't get what this
table in the end of perf report -T is supposed to mean. The large value for TID
3872 looks broken.
I had more luck with 'perf report --per-thread':
-----8< perf record --per-thread ./noppy1;perf report
68.90% noppy1 noppy1 [.] foo
29.79% noppy1 noppy1 [.] bar
...
-----8< perf record --per-thread ./noppy2;perf report
87.18% noppy2 noppy2 [.] foo
10.35% noppy2 noppy2 [.] bar
...
-----8<
So, noppy1 looks different and I can see that the effort between foo and bar
shifted. Adding a '--show-nr-samples' or a '--show-total-period' is then
telling me that the effort for foo() stays the same, but bar() gets more
expensive.
Unfortunately, I still do not understand what exactly '--per-thread' is doing.
The manpage is a little brief and I did not look in the code yet.
But it's a start!
Cheers,
Benjamin
next prev parent reply other threads:[~2016-08-19 1:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-17 13:55 Failure to parallelize Benjamin King
2016-08-18 9:56 ` Milian Wolff
2016-08-18 18:50 ` Benjamin King [this message]
2016-08-22 21:14 ` Andi Kleen
2016-08-23 6:10 ` Benjamin King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160818185048.GA2242@localhost \
--to=benjaminking@web.de \
--cc=linux-perf-users@vger.kernel.org \
--cc=milian.wolff@kdab.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).