* build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
@ 2011-07-06 18:16 Darren Hart
2011-07-07 10:39 ` Richard Purdie
0 siblings, 1 reply; 7+ messages in thread
From: Darren Hart @ 2011-07-06 18:16 UTC (permalink / raw)
To: poky@yoctoproject.org; +Cc: Josh Lock, Tom Rini
[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]
I ran the attached bb-matrix.sh on the following system:
CPU (1): Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
Cores: 4
Threads: 8
Memory: 8186560 kB
OS Disk: INTEL SSDSA2M040G2GC (SSD)
Build Disk: Hitachi HDT721050SLA360 (Spinning Media)
The script runs builds with all combinations of BB_NUMBER_THREADS and
PARALLEL_MAKE from 4 through 16.
Once BB_NUMBER_THREADS hit 10, the kernel OOM Killer started killing off
tasks and build time tripled. Those runs have been removed the dataset.
All of the runs with PARALLEL_MAKE=10 also failed, for a variety of
reasons. See bb-pm-errors.txt for details. For whatever reason, 10 seems
to be a bad number. Additional failures were seen at 09-11 and 10-14.
These have all been removed from the dat file.
From the remaining results, a clear downward trend in build time is
evident with increasing BB_NUMBER_THREADS through 8, while build time
mostly increases again with 9 (and dramatically so with 10, not shown).
Optimal build time is achieved with BB_NUMBER_THREADS=8.
Along the BB_NUMBER_THREADS=8 line, there is no clear trend with
increasing values of PARALLEL_MAKE. Local downward trends appear from
4-7 and from 11-14. Optimal build time occurs with PARALLEL_MAKE=14,
however, it only bests PARALLEL_MAKE=7 by 68 seconds.
While optimal build time is achieved with BB=8 and PM=14, a more
resource friendly setting of BB=8 and PM=6 yields nearly as good results.
To reproduce the plots and get an interactive view that you can rotate:
$ gnuplot --persist < bb-pm-matrix.plt
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
[-- Attachment #2: bb-pm-runtime-fear-jul6.dat --]
[-- Type: application/ms-tnef, Size: 6460 bytes --]
[-- Attachment #3: plot.png --]
[-- Type: image/png, Size: 14106 bytes --]
[-- Attachment #4: plot-bb.png --]
[-- Type: image/png, Size: 8406 bytes --]
[-- Attachment #5: plot-pm.png --]
[-- Type: image/png, Size: 9011 bytes --]
[-- Attachment #6: bb-matrix.sh --]
[-- Type: application/x-shellscript, Size: 1558 bytes --]
[-- Attachment #7: bb-pm-matrix.plt --]
[-- Type: text/plain, Size: 385 bytes --]
set xlabel "BB_NUMBER_THREADS"
set ylabel "PARALLEL_MAKE"
set zlabel "Build Time (seconds)"
set dgrid3d 13,6
set pm3d at b
set ticslevel 0.8
set term png
set output "plot.png"
splot "bb-pm-runtime-fear-jul6.dat" u 1:2:3 with lines
set view 90,0
set output "plot-bb.png"
replot
set view 90,90
set output "plot-pm.png"
replot
set view 60,30
set term wxt
replot
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
2011-07-06 18:16 build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization) Darren Hart
@ 2011-07-07 10:39 ` Richard Purdie
2011-07-07 18:12 ` Darren Hart
0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2011-07-07 10:39 UTC (permalink / raw)
To: Darren Hart; +Cc: Josh Lock, poky@yoctoproject.org, Tom Rini
On Wed, 2011-07-06 at 11:16 -0700, Darren Hart wrote:
> I ran the attached bb-matrix.sh on the following system:
>
> CPU (1): Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
> Cores: 4
> Threads: 8
> Memory: 8186560 kB
> OS Disk: INTEL SSDSA2M040G2GC (SSD)
> Build Disk: Hitachi HDT721050SLA360 (Spinning Media)
>
> The script runs builds with all combinations of BB_NUMBER_THREADS and
> PARALLEL_MAKE from 4 through 16.
>
> Once BB_NUMBER_THREADS hit 10, the kernel OOM Killer started killing off
> tasks and build time tripled. Those runs have been removed the dataset.
>
> All of the runs with PARALLEL_MAKE=10 also failed, for a variety of
> reasons. See bb-pm-errors.txt for details. For whatever reason, 10 seems
> to be a bad number. Additional failures were seen at 09-11 and 10-14.
> These have all been removed from the dat file.
>
> From the remaining results, a clear downward trend in build time is
> evident with increasing BB_NUMBER_THREADS through 8, while build time
> mostly increases again with 9 (and dramatically so with 10, not shown).
> Optimal build time is achieved with BB_NUMBER_THREADS=8.
>
> Along the BB_NUMBER_THREADS=8 line, there is no clear trend with
> increasing values of PARALLEL_MAKE. Local downward trends appear from
> 4-7 and from 11-14. Optimal build time occurs with PARALLEL_MAKE=14,
> however, it only bests PARALLEL_MAKE=7 by 68 seconds.
>
> While optimal build time is achieved with BB=8 and PM=14, a more
> resource friendly setting of BB=8 and PM=6 yields nearly as good results.
Thanks Darren, I think those are interesting results.
Is the general advice we should give out therefore to set
BB_NUMBER_THREADS = PARALLEL_MAKE = number threads?
I'd love to understand why there is the peak and second dip on the
PARALLEL_MAKE curve...
It would also be good to put the script in scripts/contrib.
Cheers,
Richard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
2011-07-07 10:39 ` Richard Purdie
@ 2011-07-07 18:12 ` Darren Hart
2011-07-08 20:44 ` Robert Berger
0 siblings, 1 reply; 7+ messages in thread
From: Darren Hart @ 2011-07-07 18:12 UTC (permalink / raw)
To: Richard Purdie; +Cc: Josh Lock, poky@yoctoproject.org, Tom Rini
On 07/07/2011 03:39 AM, Richard Purdie wrote:
> On Wed, 2011-07-06 at 11:16 -0700, Darren Hart wrote:
>> I ran the attached bb-matrix.sh on the following system:
>>
>> CPU (1): Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
>> Cores: 4
>> Threads: 8
>> Memory: 8186560 kB
>> OS Disk: INTEL SSDSA2M040G2GC (SSD)
>> Build Disk: Hitachi HDT721050SLA360 (Spinning Media)
>>
>> The script runs builds with all combinations of BB_NUMBER_THREADS and
>> PARALLEL_MAKE from 4 through 16.
>>
>> Once BB_NUMBER_THREADS hit 10, the kernel OOM Killer started killing off
>> tasks and build time tripled. Those runs have been removed the dataset.
>>
>> All of the runs with PARALLEL_MAKE=10 also failed, for a variety of
>> reasons. See bb-pm-errors.txt for details. For whatever reason, 10 seems
>> to be a bad number. Additional failures were seen at 09-11 and 10-14.
>> These have all been removed from the dat file.
>>
>> From the remaining results, a clear downward trend in build time is
>> evident with increasing BB_NUMBER_THREADS through 8, while build time
>> mostly increases again with 9 (and dramatically so with 10, not shown).
>> Optimal build time is achieved with BB_NUMBER_THREADS=8.
>>
>> Along the BB_NUMBER_THREADS=8 line, there is no clear trend with
>> increasing values of PARALLEL_MAKE. Local downward trends appear from
>> 4-7 and from 11-14. Optimal build time occurs with PARALLEL_MAKE=14,
>> however, it only bests PARALLEL_MAKE=7 by 68 seconds.
>>
>> While optimal build time is achieved with BB=8 and PM=14, a more
>> resource friendly setting of BB=8 and PM=6 yields nearly as good results.
>
> Thanks Darren, I think those are interesting results.
>
> Is the general advice we should give out therefore to set
> BB_NUMBER_THREADS = PARALLEL_MAKE = number threads?
I don't think we have enough data to make a general recommendation, but
for 4 core systems, I think BB=8 and PM=6 is a good choice. With some
additional runs on other hardware, hopefully we can come up with a more
general number like BB=2*NR_CORES PM=1.5*NR_CORES (cores not threads).
>
> I'd love to understand why there is the peak and second dip on the
> PARALLEL_MAKE curve...
Me too! The PM axis plots are very strange and not at all expected. I'm
seeing similarly unexpected results on the 12 core machine currently
running through a 12-48 bb-matrix run.
>
> It would also be good to put the script in scripts/contrib.
Yes, I'll try to get around to that soon... this week. They need a
little cleanup I think first.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
2011-07-07 18:12 ` Darren Hart
@ 2011-07-08 20:44 ` Robert Berger
2011-07-09 8:36 ` Darren Hart
0 siblings, 1 reply; 7+ messages in thread
From: Robert Berger @ 2011-07-08 20:44 UTC (permalink / raw)
To: poky
Darren/Richard,
Maybe we could instead of hacking hard coded default values (or nothing)
into the config file default to something like this:
somehow get the number of CPUs:
CPUS=$(grep ^processor /proc/cpuinfo | wc -l)
echo CPUS=${CPUS}
or
CPUS=`getconf _NPROCESSORS_ONLN`
echo CPUS_UBUNTU=${CPUS_UBUNTU}
(don't know if the second one will also work with other distros than Ubuntu)
Do some calculation which magic number for BB_NUMBER_THREADS and
PARALLEL_MAKE to use:
e.g. what was suggested: BB=2*NR_CORES PM=1.5*NR_CORES
Regards,
Robert
...For every complex problem there is a solution which is simple, neat
and wrong. -- H.L. Mencken
My public pgp key is available at:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
2011-07-08 20:44 ` Robert Berger
@ 2011-07-09 8:36 ` Darren Hart
2011-07-09 21:16 ` Chris Larson
0 siblings, 1 reply; 7+ messages in thread
From: Darren Hart @ 2011-07-09 8:36 UTC (permalink / raw)
To: gmane; +Cc: poky
On 07/08/2011 01:44 PM, Robert Berger wrote:
> Darren/Richard,
>
> Maybe we could instead of hacking hard coded default values (or nothing)
> into the config file default to something like this:
>
> somehow get the number of CPUs:
>
> CPUS=$(grep ^processor /proc/cpuinfo | wc -l)
> echo CPUS=${CPUS}
>
> or
>
> CPUS=`getconf _NPROCESSORS_ONLN`
> echo CPUS_UBUNTU=${CPUS_UBUNTU}
>
> (don't know if the second one will also work with other distros than Ubuntu)
>
> Do some calculation which magic number for BB_NUMBER_THREADS and
> PARALLEL_MAKE to use:
>
> e.g. what was suggested: BB=2*NR_CORES PM=1.5*NR_CORES
My concern with this is that on larger machines I'm seeing very
different optimal multipliers. On my 12 core with a RAID 0 build array,
the ideal setting seems to be BB=12 PM=12.
Until we can better characterize the ideal settings, I think we are
better off documenting what works for specific systems. Now perhaps we
need to do something that caps the number, but that is sure to be wrong
in short order as well.
As your signature suggests, the solution to this isn't likely to be
simple ;-)
--
Darren
>
> Regards,
>
> Robert
> ...For every complex problem there is a solution which is simple, neat
> and wrong. -- H.L. Mencken
>
> My public pgp key is available at:
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1
>
>
> _______________________________________________
> poky mailing list
> poky@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/poky
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
2011-07-09 8:36 ` Darren Hart
@ 2011-07-09 21:16 ` Chris Larson
2011-07-10 7:13 ` Darren Hart
0 siblings, 1 reply; 7+ messages in thread
From: Chris Larson @ 2011-07-09 21:16 UTC (permalink / raw)
To: Darren Hart; +Cc: gmane, poky
On Sat, Jul 9, 2011 at 1:36 AM, Darren Hart <dvhart@linux.intel.com> wrote:
> On 07/08/2011 01:44 PM, Robert Berger wrote:
>> Darren/Richard,
>>
>> Maybe we could instead of hacking hard coded default values (or nothing)
>> into the config file default to something like this:
>>
>> somehow get the number of CPUs:
>>
>> CPUS=$(grep ^processor /proc/cpuinfo | wc -l)
>> echo CPUS=${CPUS}
>>
>> or
>>
>> CPUS=`getconf _NPROCESSORS_ONLN`
>> echo CPUS_UBUNTU=${CPUS_UBUNTU}
>>
>> (don't know if the second one will also work with other distros than Ubuntu)
>>
>> Do some calculation which magic number for BB_NUMBER_THREADS and
>> PARALLEL_MAKE to use:
>>
>> e.g. what was suggested: BB=2*NR_CORES PM=1.5*NR_CORES
>
> My concern with this is that on larger machines I'm seeing very
> different optimal multipliers. On my 12 core with a RAID 0 build array,
> the ideal setting seems to be BB=12 PM=12.
>
> Until we can better characterize the ideal settings, I think we are
> better off documenting what works for specific systems. Now perhaps we
> need to do something that caps the number, but that is sure to be wrong
> in short order as well.
>
> As your signature suggests, the solution to this isn't likely to be
> simple ;-)
This may be rather specific to my personal setup, but I use
https://gist.github.com/776390 -- you'll note that you can adjust the
scaling factors via variables.
--
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization)
2011-07-09 21:16 ` Chris Larson
@ 2011-07-10 7:13 ` Darren Hart
0 siblings, 0 replies; 7+ messages in thread
From: Darren Hart @ 2011-07-10 7:13 UTC (permalink / raw)
To: Chris Larson; +Cc: gmane, poky
On 07/09/2011 02:16 PM, Chris Larson wrote:
> On Sat, Jul 9, 2011 at 1:36 AM, Darren Hart <dvhart@linux.intel.com> wrote:
>> On 07/08/2011 01:44 PM, Robert Berger wrote:
>>> Darren/Richard,
>>>
>>> Maybe we could instead of hacking hard coded default values (or nothing)
>>> into the config file default to something like this:
>>>
>>> somehow get the number of CPUs:
>>>
>>> CPUS=$(grep ^processor /proc/cpuinfo | wc -l)
>>> echo CPUS=${CPUS}
>>>
>>> or
>>>
>>> CPUS=`getconf _NPROCESSORS_ONLN`
>>> echo CPUS_UBUNTU=${CPUS_UBUNTU}
>>>
>>> (don't know if the second one will also work with other distros than Ubuntu)
>>>
>>> Do some calculation which magic number for BB_NUMBER_THREADS and
>>> PARALLEL_MAKE to use:
>>>
>>> e.g. what was suggested: BB=2*NR_CORES PM=1.5*NR_CORES
>>
>> My concern with this is that on larger machines I'm seeing very
>> different optimal multipliers. On my 12 core with a RAID 0 build array,
>> the ideal setting seems to be BB=12 PM=12.
>>
>> Until we can better characterize the ideal settings, I think we are
>> better off documenting what works for specific systems. Now perhaps we
>> need to do something that caps the number, but that is sure to be wrong
>> in short order as well.
>>
>> As your signature suggests, the solution to this isn't likely to be
>> simple ;-)
>
> This may be rather specific to my personal setup, but I use
> https://gist.github.com/776390 -- you'll note that you can adjust the
> scaling factors via variables.
Something like this would probably be a good improvement - but it will
need some sort of step function (of cpu count) for the multipliers. I'm
concerned this step function will be tedious to maintain. I suspect the
ideal number is also dependent on on build path storage (spinning disk,
RAID, SSD, tmpfs, etc.), faster storage can likely benefit from higher
thread counts, whereas slower storage just gets more and more bogged
down under higher thread counts. I'll have some numbers from the 12 core
on Tuesday if I'm extrapolating accurately.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-07-10 7:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-06 18:16 build performance: bb-matrix on 4-core (BB_NUMBER_THREADS and PARALLEL_MAKE optimization) Darren Hart
2011-07-07 10:39 ` Richard Purdie
2011-07-07 18:12 ` Darren Hart
2011-07-08 20:44 ` Robert Berger
2011-07-09 8:36 ` Darren Hart
2011-07-09 21:16 ` Chris Larson
2011-07-10 7:13 ` Darren Hart
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.