* Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
@ 2020-12-03 17:48 Ross Burton
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Ross Burton @ 2020-12-03 17:48 UTC (permalink / raw)
To: OE-core
Hi,
Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
available unless told otherwise. This was a good idea six years
ago[1] but some modern machines are moving to very large core counts.
For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
is dual socket 2*80=160 cores.
At this level of parallelisation the sheer amount of I/O from the
unpack storm is quite excessive. As a strawman argument, I propose a
hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
making up numbers here -- 32. Maybe 64. Comments?
Cheers,
Ross
[1] http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
@ 2020-12-03 18:20 ` Alexander Kanavin
2020-12-03 18:51 ` Konrad Weihmann
` (2 more replies)
2020-12-03 18:31 ` Andre McCurdy
` (4 subsequent siblings)
5 siblings, 3 replies; 10+ messages in thread
From: Alexander Kanavin @ 2020-12-03 18:20 UTC (permalink / raw)
To: Ross Burton; +Cc: OE-core
[-- Attachment #1: Type: text/plain, Size: 997 bytes --]
I'd rather teach bitbake to abstain from starting new tasks when I/O or CPU
gets tight.
Alex
On Thu, 3 Dec 2020 at 18:48, Ross Burton <ross@burtonini.com> wrote:
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
>
> Cheers,
> Ross
>
> [1]
> http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 1510 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
@ 2020-12-03 18:31 ` Andre McCurdy
2020-12-03 18:32 ` Paul Barker
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Andre McCurdy @ 2020-12-03 18:31 UTC (permalink / raw)
To: Ross Burton; +Cc: OE-core
On Thu, Dec 3, 2020 at 9:48 AM Ross Burton <ross@burtonini.com> wrote:
>
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
Since the default should be the "safe" default (and users wanting or
needing more are expected to profile and tune manually) a cap of 32
sounds reasonable.
> Cheers,
> Ross
>
> [1] http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
2020-12-03 18:31 ` Andre McCurdy
@ 2020-12-03 18:32 ` Paul Barker
2020-12-03 18:56 ` Khem Raj
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Paul Barker @ 2020-12-03 18:32 UTC (permalink / raw)
To: Ross Burton; +Cc: OE-core
On Thu, 3 Dec 2020 at 17:48, Ross Burton <ross@burtonini.com> wrote:
>
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
This is really going to depend on what storage technology you're
using. I used to limit both at 8 when I was using traditional HDDs.
With NVMe drives I see no need to limit at the top end of the systems
I've built on (12c/24t).
--
Paul Barker
Konsulko Group
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
@ 2020-12-03 18:51 ` Konrad Weihmann
2020-12-04 6:39 ` Mikko Rapeli
2020-12-04 9:23 ` Jose Quaresma
2 siblings, 0 replies; 10+ messages in thread
From: Konrad Weihmann @ 2020-12-03 18:51 UTC (permalink / raw)
To: openembedded-core
+1 for Alex's comment.
As I/O (and potentially RAM) is clearly the bottleneck here, limitation
to some arbitrary value doesn't address the issue at the right end.
I would rather see real resource management.
From my point of view there is a huge difference if I package like 128
shell scripts in parallel or invoke 128 gcc linker runs
On 03.12.20 19:20, Alexander Kanavin wrote:
> I'd rather teach bitbake to abstain from starting new tasks when I/O or
> CPU gets tight.
>
> Alex
>
> On Thu, 3 Dec 2020 at 18:48, Ross Burton <ross@burtonini.com
> <mailto:ross@burtonini.com>> wrote:
>
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
>
> Cheers,
> Ross
>
> [1]
> http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
` (2 preceding siblings ...)
2020-12-03 18:32 ` Paul Barker
@ 2020-12-03 18:56 ` Khem Raj
2020-12-03 23:05 ` Richard Purdie
2020-12-04 6:38 ` Mikko Rapeli
5 siblings, 0 replies; 10+ messages in thread
From: Khem Raj @ 2020-12-03 18:56 UTC (permalink / raw)
To: Ross Burton; +Cc: OE-core
On Thu, Dec 3, 2020 at 9:48 AM Ross Burton <ross@burtonini.com> wrote:
>
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
>
I think capping at say 32 might be fine also think about how can we
then make it known widely to users since this
will be important to know.
> Cheers,
> Ross
>
> [1] http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
` (3 preceding siblings ...)
2020-12-03 18:56 ` Khem Raj
@ 2020-12-03 23:05 ` Richard Purdie
2020-12-04 6:38 ` Mikko Rapeli
5 siblings, 0 replies; 10+ messages in thread
From: Richard Purdie @ 2020-12-03 23:05 UTC (permalink / raw)
To: Ross Burton, OE-core
On Thu, 2020-12-03 at 17:48 +0000, Ross Burton wrote:
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of
> cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has
> 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
I've had no issues on 88 core systems. I'm not sure there is an
"automatic" value we can guess at here and it is mentioned in
local.conf for the user to customise...
What might make more sense is to have site.conf from a user's homedir
working better, although that does have issues of its own.
I do agree with Alex that having bitbake throttle on system load would
be nicer.
Cheers,
Richard
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
` (4 preceding siblings ...)
2020-12-03 23:05 ` Richard Purdie
@ 2020-12-04 6:38 ` Mikko Rapeli
5 siblings, 0 replies; 10+ messages in thread
From: Mikko Rapeli @ 2020-12-04 6:38 UTC (permalink / raw)
To: ross; +Cc: openembedded-core
Hi,
On Thu, Dec 03, 2020 at 05:48:16PM +0000, Ross Burton wrote:
> Hi,
>
> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
> available unless told otherwise. This was a good idea six years
> ago[1] but some modern machines are moving to very large core counts.
>
> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
> is dual socket 2*80=160 cores.
>
> At this level of parallelisation the sheer amount of I/O from the
> unpack storm is quite excessive. As a strawman argument, I propose a
> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
> making up numbers here -- 32. Maybe 64. Comments?
Number of cores is far from sufficient in real world. Amount of physical RAM should be
taken into account too. I'd say 2 Gb of physical RAM per thread should be available to
compile and link modern C++ SW to avoid out-of-memory killer from kicking in.
Here is one algorithm which avoids oom killer in our case:
mem = get_mem_total()
cpus = get_number_cpus()
mem_cpus = (mem * 1.0) / cpus
if cpus == 1:
# In case of a single CPU, don't parallelize
self.bb_number_threads = 1
self.parallel_make = make_j(1)
elif mem_cpus > 8:
self.bb_number_threads = cpus
self.parallel_make = make_j(cpus)
elif mem_cpus >= 4:
self.bb_number_threads = cpus
self.parallel_make = make_j(divide_cpus(cpus, 2))
elif mem_cpus >= 2:
self.bb_number_threads = divide_cpus(cpus, 2)
self.parallel_make = make_j(divide_cpus(cpus, 2))
else:
self.bb_number_threads = divide_cpus(cpus, 2)
self.parallel_make = make_j(divide_cpus(cpus, 4))
Cheers,
-Mikko
>
> Cheers,
> Ross
>
> [1] http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
2020-12-03 18:51 ` Konrad Weihmann
@ 2020-12-04 6:39 ` Mikko Rapeli
2020-12-04 9:23 ` Jose Quaresma
2 siblings, 0 replies; 10+ messages in thread
From: Mikko Rapeli @ 2020-12-04 6:39 UTC (permalink / raw)
To: alex.kanavin; +Cc: ross, openembedded-core
On Thu, Dec 03, 2020 at 07:20:11PM +0100, Alexander Kanavin wrote:
> I'd rather teach bitbake to abstain from starting new tasks when I/O or CPU
> gets tight.
And memory!
-Mikko
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OE-core] Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
2020-12-03 18:51 ` Konrad Weihmann
2020-12-04 6:39 ` Mikko Rapeli
@ 2020-12-04 9:23 ` Jose Quaresma
2 siblings, 0 replies; 10+ messages in thread
From: Jose Quaresma @ 2020-12-04 9:23 UTC (permalink / raw)
To: Alexander Kanavin; +Cc: Ross Burton, OE-core
[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]
Alexander Kanavin <alex.kanavin@gmail.com> escreveu no dia quinta,
3/12/2020 à(s) 18:20:
> I'd rather teach bitbake to abstain from starting new tasks when I/O or
> CPU gets tight.
>
This is definitely the best approach in my view. however more complex to
implement.
Quaresma
> Alex
>
> On Thu, 3 Dec 2020 at 18:48, Ross Burton <ross@burtonini.com> wrote:
>
>> Hi,
>>
>> Currently, BB_NUMBER_THREADS and PARALLEL_MAKE use the number of cores
>> available unless told otherwise. This was a good idea six years
>> ago[1] but some modern machines are moving to very large core counts.
>>
>> For example, 88 core dual Xeons are fairly common. A ThunderX2 has 256
>> cores (2 sockets, 4 hyperthreads per physical core). The Ampere Altra
>> is dual socket 2*80=160 cores.
>>
>> At this level of parallelisation the sheer amount of I/O from the
>> unpack storm is quite excessive. As a strawman argument, I propose a
>> hard cap to the default BB_NUMBER_THREADS of -- and I'm literally
>> making up numbers here -- 32. Maybe 64. Comments?
>>
>> Cheers,
>> Ross
>>
>> [1]
>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=1529ef0504542145f2b81b2dba4bcc81d5dac96e
>>
>>
>>
>>
>
>
>
--
best regards,
José Quaresma
[-- Attachment #2: Type: text/html, Size: 2396 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-12-04 9:23 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-03 17:48 Automatic cap on BB_NUMBER_THREADS and PARALLEL_MAKE Ross Burton
2020-12-03 18:20 ` [OE-core] " Alexander Kanavin
2020-12-03 18:51 ` Konrad Weihmann
2020-12-04 6:39 ` Mikko Rapeli
2020-12-04 9:23 ` Jose Quaresma
2020-12-03 18:31 ` Andre McCurdy
2020-12-03 18:32 ` Paul Barker
2020-12-03 18:56 ` Khem Raj
2020-12-03 23:05 ` Richard Purdie
2020-12-04 6:38 ` Mikko Rapeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox