* [PATCH 0/2] Spawn extra tasks at low CPU worload @ 2018-08-13 21:04 Andreas Müller 2018-08-13 21:04 ` [PATCH 1/2] runqueue: Move decision if a task can be started to one common place Andreas Müller 2018-08-13 21:04 ` [PATCH 2/2] runqueue: Introduce load balanced task spawning Andreas Müller 0 siblings, 2 replies; 25+ messages in thread From: Andreas Müller @ 2018-08-13 21:04 UTC (permalink / raw) To: bitbake-devel If it's only to prevent others from trying the same... Andreas Müller (2): runqueue: Move decision if a task can be started to one common place runqueue: Introduce load balanced task spawning lib/bb/runqueue.py | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) -- 2.14.4 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] runqueue: Move decision if a task can be started to one common place 2018-08-13 21:04 [PATCH 0/2] Spawn extra tasks at low CPU worload Andreas Müller @ 2018-08-13 21:04 ` Andreas Müller 2018-08-15 8:37 ` Richard Purdie 2018-08-13 21:04 ` [PATCH 2/2] runqueue: Introduce load balanced task spawning Andreas Müller 1 sibling, 1 reply; 25+ messages in thread From: Andreas Müller @ 2018-08-13 21:04 UTC (permalink / raw) To: bitbake-devel Signed-off-by: Andreas Müller <schnitzeltony@gmail.com> --- lib/bb/runqueue.py | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py index 400709c1..7095ea5a 100644 --- a/lib/bb/runqueue.py +++ b/lib/bb/runqueue.py @@ -199,7 +199,7 @@ class RunQueueScheduler(object): """ Return the id of the task we should build next """ - if self.rq.stats.active < self.rq.number_tasks: + if self.rq.can_start_task(): return self.next_buildable_task() def newbuildable(self, task): @@ -1754,6 +1754,10 @@ class RunQueueExecute: valid = bb.utils.better_eval(call, locs) return valid + def can_start_task(self): + can_start = self.stats.active < self.number_tasks + return can_start + class RunQueueExecuteDummy(RunQueueExecute): def __init__(self, rq): self.rq = rq @@ -2044,7 +2048,7 @@ class RunQueueExecuteTasks(RunQueueExecute): self.build_stamps2.append(self.build_stamps[task]) self.runq_running.add(task) self.stats.taskActive() - if self.stats.active < self.number_tasks: + if self.can_start_task(): return True if self.stats.active > 0: @@ -2404,7 +2408,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute): self.rq.read_workers() task = None - if self.stats.active < self.number_tasks: + if self.can_start_task(): # Find the next setscene to run for nexttask in self.rqdata.runq_setscene_tids: if nexttask in self.runq_buildable and nexttask not in self.runq_running and self.stamps[nexttask] not in self.build_stamps.values(): @@ -2463,7 +2467,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute): self.build_stamps2.append(self.build_stamps[task]) self.runq_running.add(task) self.stats.taskActive() - if self.stats.active < self.number_tasks: + if self.can_start_task(): return True if self.stats.active > 0: -- 2.14.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] runqueue: Move decision if a task can be started to one common place 2018-08-13 21:04 ` [PATCH 1/2] runqueue: Move decision if a task can be started to one common place Andreas Müller @ 2018-08-15 8:37 ` Richard Purdie 2018-08-15 8:43 ` Andreas Müller 0 siblings, 1 reply; 25+ messages in thread From: Richard Purdie @ 2018-08-15 8:37 UTC (permalink / raw) To: Andreas Müller, bitbake-devel On Mon, 2018-08-13 at 23:04 +0200, Andreas Müller wrote: > Signed-off-by: Andreas Müller <schnitzeltony@gmail.com> > --- > lib/bb/runqueue.py | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) I don't think it makes sense to take 2/2 as the gains don't seem worth the complexity. 1/2 is a good improvement to the code and makes experimentation easier so I'll likely take that though. Cheers, Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] runqueue: Move decision if a task can be started to one common place 2018-08-15 8:37 ` Richard Purdie @ 2018-08-15 8:43 ` Andreas Müller 0 siblings, 0 replies; 25+ messages in thread From: Andreas Müller @ 2018-08-15 8:43 UTC (permalink / raw) To: Richard Purdie; +Cc: bitbake-devel On Wed, Aug 15, 2018 at 10:37 AM, Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > On Mon, 2018-08-13 at 23:04 +0200, Andreas Müller wrote: >> Signed-off-by: Andreas Müller <schnitzeltony@gmail.com> >> --- >> lib/bb/runqueue.py | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) > > I don't think it makes sense to take 2/2 as the gains don't seem worth > the complexity. Agreed: Meanwhile I found that importing psutils seems to break wic. > 1/2 is a good improvement to the code and makes > experimentation easier so I'll likely take that though. Thanks > > Cheers, > > Richard Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 21:04 [PATCH 0/2] Spawn extra tasks at low CPU worload Andreas Müller 2018-08-13 21:04 ` [PATCH 1/2] runqueue: Move decision if a task can be started to one common place Andreas Müller @ 2018-08-13 21:04 ` Andreas Müller 2018-08-13 21:20 ` Alexander Kanavin 1 sibling, 1 reply; 25+ messages in thread From: Andreas Müller @ 2018-08-13 21:04 UTC (permalink / raw) To: bitbake-devel To get most out of build host, bitbake now detects if the CPU workload is low. If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by BB_NUMBER_THREADS_LOW_CPU. So now user can set a range for the count of tasks: Min: BB_NUMBER_THREADS Min: BB_NUMBER_THREADS_LOW_CPU in which bitbake can operate on demand. Some numbers for 6 core AMD bulldozer 12GB RAM / build image from scratch with 3104 tasks / PARALLEL_MAKE = "-j6" / PARALLEL_MAKEINST="-j6": Before the patch (same as BB_NUMBER_THREADS_LOW_CPU = "0" or not set): BB_NUMBER_THREADS | Build time [s] ------------------------------------ 2 | 156m48.741s ------------------------------------ 3 | 126m27.426s ------------------------------------ 4 | 114m30.560s <-- winner (as suggested in doc!) ------------------------------------ 5 | 117m2.679s ------------------------------------ 6 | 116m37.515s ------------------------------------ 8 | 116m37.515s ------------------------------------ 10 | 118m18.441s ------------------------------------ 12 | 117m38.264s With the patch applied and BB_NUMBER_THREADS_LOW_CPU = "20" (as written in docs for max thread count) BB_NUMBER_THREADS | Build time [s] ------------------------------------ 3 | 114m48.105s ------------------------------------ 4 | 113m26.502s ------------------------------------ Some additional notes: + Although not tested I expect better enhancement for setscene sessions + At times back when do_package_qa was a dependency for do_rootfs, the performance win would have been more significant: For the image tested only very few do_package_qa were performed while do_rootfs. The static threads = 4 winner had more of them and would have waited longer - sigh :) + It's more fun to watch bitbake at work torturing CPU. If you want to do so and use gnome's system monitor, be aware that CPU History is delayed for 2-3s. I was sometimes wondering 'why more task's now? - For building image from scratch the performace win is somewhat dissapointing ~1% - Patch creates a dependecy on psutils - time.monotonic is not yet used. It was introduced in python 3.3 (2012) and is considered supported in all environments (whatever that means) since 3.5 (2015). Signed-off-by: Andreas Müller <schnitzeltony@gmail.com> --- lib/bb/runqueue.py | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py index 7095ea5a..2690c2a2 100644 --- a/lib/bb/runqueue.py +++ b/lib/bb/runqueue.py @@ -37,6 +37,8 @@ from bb import monitordisk import subprocess import pickle from multiprocessing import Process +import psutil +import time bblogger = logging.getLogger("BitBake") logger = logging.getLogger("BitBake.RunQueue") @@ -1668,6 +1670,7 @@ class RunQueueExecute: self.rqdata = rq.rqdata self.number_tasks = int(self.cfgData.getVar("BB_NUMBER_THREADS") or 1) + self.number_tasks_low_cpu = int(self.cfgData.getVar("BB_NUMBER_THREADS_LOW_CPU") or 0) self.scheduler = self.cfgData.getVar("BB_SCHEDULER") or "speed" self.runq_buildable = set() @@ -1679,6 +1682,8 @@ class RunQueueExecute: self.failed_tids = [] self.stampcache = {} + self.last_cpu_percent = psutil.cpu_percent() + self.last_cpu_percent_time = time.monotonic() for mc in rq.worker: rq.worker[mc].pipe.setrunqueueexec(self) @@ -1687,6 +1692,8 @@ class RunQueueExecute: if self.number_tasks <= 0: bb.fatal("Invalid BB_NUMBER_THREADS %s" % self.number_tasks) + if self.number_tasks_low_cpu < 0: + bb.fatal("Invalid BB_NUMBER_THREADS_LOW_CPU %s" % self.number_tasks_low_cpu) def runqueue_process_waitpid(self, task, status): @@ -1756,6 +1763,17 @@ class RunQueueExecute: def can_start_task(self): can_start = self.stats.active < self.number_tasks + # Can we inject extra tasks for low workload? + if not can_start and self.number_tasks_low_cpu > 0: + _time = time.monotonic() + # avoid workload inaccuray + if _time - self.last_cpu_percent_time >= 0.1: + cpu_percent = psutil.cpu_percent() + self.last_cpu_percent = cpu_percent + self.last_cpu_percent_time = _time + if cpu_percent < 90 and self.stats.active < self.number_tasks_low_cpu: + can_start = True + return can_start class RunQueueExecuteDummy(RunQueueExecute): -- 2.14.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 21:04 ` [PATCH 2/2] runqueue: Introduce load balanced task spawning Andreas Müller @ 2018-08-13 21:20 ` Alexander Kanavin 2018-08-13 21:30 ` Andreas Müller 2018-08-14 1:11 ` Martin Jansa 0 siblings, 2 replies; 25+ messages in thread From: Alexander Kanavin @ 2018-08-13 21:20 UTC (permalink / raw) To: Andreas Müller; +Cc: bitbake-devel 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: > To get most out of build host, bitbake now detects if the CPU workload is low. > If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by > BB_NUMBER_THREADS_LOW_CPU. So the best improvement is going from 114.5 minutes to 113.5 minutes? I don't think it's worth the trouble. Maybe it's time to invest in 16 (or even 32!) core amd threadripper? :) Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 21:20 ` Alexander Kanavin @ 2018-08-13 21:30 ` Andreas Müller 2018-08-13 22:37 ` Andre McCurdy 2018-08-14 1:11 ` Martin Jansa 1 sibling, 1 reply; 25+ messages in thread From: Andreas Müller @ 2018-08-13 21:30 UTC (permalink / raw) To: Alexander Kanavin; +Cc: bitbake-devel On Mon, Aug 13, 2018 at 11:20 PM, Alexander Kanavin <alex.kanavin@gmail.com> wrote: > 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: >> To get most out of build host, bitbake now detects if the CPU workload is low. >> If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by >> BB_NUMBER_THREADS_LOW_CPU. > > So the best improvement is going from 114.5 minutes to 113.5 minutes? > I don't think it's worth the trouble. Maybe it's time to invest in 16 > (or even 32!) core amd threadripper? :) > Indeed - but I have to wait till next holiday to assemble such kind of machine. I still can't believe the results are that disappointing... Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 21:30 ` Andreas Müller @ 2018-08-13 22:37 ` Andre McCurdy 2018-08-14 0:11 ` Andreas Müller 2018-08-14 9:05 ` Andreas Müller 0 siblings, 2 replies; 25+ messages in thread From: Andre McCurdy @ 2018-08-13 22:37 UTC (permalink / raw) To: Andreas Müller; +Cc: bitbake-devel On Mon, Aug 13, 2018 at 2:30 PM, Andreas Müller <schnitzeltony@gmail.com> wrote: > On Mon, Aug 13, 2018 at 11:20 PM, Alexander Kanavin > <alex.kanavin@gmail.com> wrote: >> 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: >>> To get most out of build host, bitbake now detects if the CPU workload is low. >>> If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by >>> BB_NUMBER_THREADS_LOW_CPU. >> >> So the best improvement is going from 114.5 minutes to 113.5 minutes? >> I don't think it's worth the trouble. Maybe it's time to invest in 16 >> (or even 32!) core amd threadripper? :) >> > Indeed - but I have to wait till next holiday to assemble such kind of machine. > > I still can't believe the results are that disappointing... I wonder if you've experimented with the opposite approach, ie spawning fewer tasks when CPU load is very high? If a single do_compile task can fully load all CPUs, not running other tasks in parallel with it (especially another do_compile) might give some benefit? Dynamically boosting and dynamically lowering BB_NUMBER_THREADS based on overall CPU load both seem like logical things to do. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 22:37 ` Andre McCurdy @ 2018-08-14 0:11 ` Andreas Müller 2018-08-14 1:01 ` Andre McCurdy 2018-08-14 9:05 ` Andreas Müller 1 sibling, 1 reply; 25+ messages in thread From: Andreas Müller @ 2018-08-14 0:11 UTC (permalink / raw) To: Andre McCurdy; +Cc: bitbake-devel On Tue, Aug 14, 2018 at 12:37 AM, Andre McCurdy <armccurdy@gmail.com> wrote: > On Mon, Aug 13, 2018 at 2:30 PM, Andreas Müller <schnitzeltony@gmail.com> wrote: >> On Mon, Aug 13, 2018 at 11:20 PM, Alexander Kanavin >> <alex.kanavin@gmail.com> wrote: >>> 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: >>>> To get most out of build host, bitbake now detects if the CPU workload is low. >>>> If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by >>>> BB_NUMBER_THREADS_LOW_CPU. >>> >>> So the best improvement is going from 114.5 minutes to 113.5 minutes? >>> I don't think it's worth the trouble. Maybe it's time to invest in 16 >>> (or even 32!) core amd threadripper? :) >>> >> Indeed - but I have to wait till next holiday to assemble such kind of machine. >> >> I still can't believe the results are that disappointing... > > I wonder if you've experimented with the opposite approach, ie > spawning fewer tasks when CPU load is very high? If a single > do_compile task can fully load all CPUs, not running other tasks in > parallel with it (especially another do_compile) might give some > benefit? How shall this work? 100% should be target. > > Dynamically boosting and dynamically lowering BB_NUMBER_THREADS based > on overall CPU load both seem like logical things to do. Meanwhile I tested this on another machine (yeah should have done before sending out): Quite good processor / poor harddisk. As soon as harddisk is the bottleneck (or when swapping) -> workload goes down -> task explosion. Not exactly a good idea. So better go Alex's suggestion :) Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 0:11 ` Andreas Müller @ 2018-08-14 1:01 ` Andre McCurdy 0 siblings, 0 replies; 25+ messages in thread From: Andre McCurdy @ 2018-08-14 1:01 UTC (permalink / raw) To: Andreas Müller; +Cc: bitbake-devel On Mon, Aug 13, 2018 at 5:11 PM, Andreas Müller <schnitzeltony@gmail.com> wrote: > On Tue, Aug 14, 2018 at 12:37 AM, Andre McCurdy <armccurdy@gmail.com> wrote: >> On Mon, Aug 13, 2018 at 2:30 PM, Andreas Müller <schnitzeltony@gmail.com> wrote: >>> On Mon, Aug 13, 2018 at 11:20 PM, Alexander Kanavin >>> <alex.kanavin@gmail.com> wrote: >>>> 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: >>>>> To get most out of build host, bitbake now detects if the CPU workload is low. >>>>> If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by >>>>> BB_NUMBER_THREADS_LOW_CPU. >>>> >>>> So the best improvement is going from 114.5 minutes to 113.5 minutes? >>>> I don't think it's worth the trouble. Maybe it's time to invest in 16 >>>> (or even 32!) core amd threadripper? :) >>>> >>> Indeed - but I have to wait till next holiday to assemble such kind of machine. >>> >>> I still can't believe the results are that disappointing... >> >> I wonder if you've experimented with the opposite approach, ie >> spawning fewer tasks when CPU load is very high? If a single >> do_compile task can fully load all CPUs, not running other tasks in >> parallel with it (especially another do_compile) might give some >> benefit? > How shall this work? 100% should be target. Aim should be to limit the chance that the CPUs are completely overloaded, e.g. with 4 CPUs, try to avoid running 4 x do_compile in parallel. If you define a single do_compile task which is able to load all CPUs as "100%" then the limit (not target) should perhaps be 200%? Some experimentation would be needed to fine tune. >> Dynamically boosting and dynamically lowering BB_NUMBER_THREADS based >> on overall CPU load both seem like logical things to do. > Meanwhile I tested this on another machine (yeah should have done > before sending out): Quite good processor / poor harddisk. As soon as > harddisk is the bottleneck (or when swapping) -> workload goes down -> > task explosion. Not exactly a good idea. What does task explosion mean? Hitting the BB_NUMBER_THREADS_LOW_CPU limit? If BB_NUMBER_THREADS_LOW_CPU is set to something fairly safe (1.5 x BB_NUMBER_THREADS ?) hitting that limit doesn't seem like a big issue. Or does "task explosion" mean your implementation was buggy and BB_NUMBER_THREADS_LOW_CPU was exceeded? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 22:37 ` Andre McCurdy 2018-08-14 0:11 ` Andreas Müller @ 2018-08-14 9:05 ` Andreas Müller 2018-08-14 9:18 ` Alexander Kanavin 1 sibling, 1 reply; 25+ messages in thread From: Andreas Müller @ 2018-08-14 9:05 UTC (permalink / raw) To: Andre McCurdy; +Cc: bitbake-devel On Tue, Aug 14, 2018 at 12:37 AM, Andre McCurdy <armccurdy@gmail.com> wrote: > On Mon, Aug 13, 2018 at 2:30 PM, Andreas Müller <schnitzeltony@gmail.com> wrote: >> On Mon, Aug 13, 2018 at 11:20 PM, Alexander Kanavin >> <alex.kanavin@gmail.com> wrote: >>> 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: >>>> To get most out of build host, bitbake now detects if the CPU workload is low. >>>> If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by >>>> BB_NUMBER_THREADS_LOW_CPU. >>> >>> So the best improvement is going from 114.5 minutes to 113.5 minutes? >>> I don't think it's worth the trouble. Maybe it's time to invest in 16 >>> (or even 32!) core amd threadripper? :) >>> >> Indeed - but I have to wait till next holiday to assemble such kind of machine. >> >> I still can't believe the results are that disappointing... > > I wonder if you've experimented with the opposite approach, ie > spawning fewer tasks when CPU load is very high? If a single > do_compile task can fully load all CPUs, not running other tasks in > parallel with it (especially another do_compile) might give some > benefit? > > Dynamically boosting and dynamically lowering BB_NUMBER_THREADS based > on overall CPU load both seem like logical things to do. I think the patch does this of you interpret parameters differently. So if you use BB_NUMBER_THREADS = 10 currently and want to go down to 2 in case heavy load set BB_NUMBER_THREADS = 2 BB_NUMBER_THREADS_LOW_CPU = 10 Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 9:05 ` Andreas Müller @ 2018-08-14 9:18 ` Alexander Kanavin 0 siblings, 0 replies; 25+ messages in thread From: Alexander Kanavin @ 2018-08-14 9:18 UTC (permalink / raw) To: Andreas Müller; +Cc: bitbake-devel 2018-08-14 11:05 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: >> I wonder if you've experimented with the opposite approach, ie >> spawning fewer tasks when CPU load is very high? If a single >> do_compile task can fully load all CPUs, not running other tasks in >> parallel with it (especially another do_compile) might give some >> benefit? >> >> Dynamically boosting and dynamically lowering BB_NUMBER_THREADS based >> on overall CPU load both seem like logical things to do. > > I think the patch does this of you interpret parameters differently. > So if you use BB_NUMBER_THREADS = 10 currently and want to go down to > 2 in case heavy load set > > BB_NUMBER_THREADS = 2 > BB_NUMBER_THREADS_LOW_CPU = 10 Right, then that's quite useful! Is it also possible to detect the low RAM situation? Perhaps it's better to rename the parameters to BB_NUMBER_THREADS_MIN and BB_NUMBER_THREADS_MAX? Then bitbake would run tasks within the range, with the aim of keeping the CPU loaded, but not overloaded. There could even be reasonable defaults: number of cores/threads for MAX, and 2 for MIN. Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-13 21:20 ` Alexander Kanavin 2018-08-13 21:30 ` Andreas Müller @ 2018-08-14 1:11 ` Martin Jansa 2018-08-14 6:32 ` Mikko.Rapeli 1 sibling, 1 reply; 25+ messages in thread From: Martin Jansa @ 2018-08-14 1:11 UTC (permalink / raw) To: Alexander Kanavin; +Cc: bitbake-devel [-- Attachment #1: Type: text/plain, Size: 1582 bytes --] On Mon, Aug 13, 2018 at 11:20:50PM +0200, Alexander Kanavin wrote: > 2018-08-13 23:04 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: > > To get most out of build host, bitbake now detects if the CPU workload is low. > > If so, additional tasks are spawned. Maximum 'dynamic' tasks are set by > > BB_NUMBER_THREADS_LOW_CPU. > > So the best improvement is going from 114.5 minutes to 113.5 minutes? > I don't think it's worth the trouble. Maybe it's time to invest in 16 > (or even 32!) core amd threadripper? :) IMHO the best improvement was for 3 BB_NUMBER_THREADS and possibly even bigger improvement for 2 BB_NUMBER_THREADS. > 2 | 156m48.741s > ------------------------------------ > 3 | 126m27.426s > ------------------------------------ ... > With the patch applied and BB_NUMBER_THREADS_LOW_CPU = "20" (as > written in docs > for max thread count) > > BB_NUMBER_THREADS | Build time [s] > ------------------------------------ > 3 | 114m48.105s I'm running with 2 BB_NUMBER_THREADS on similar HW (8core Bulldozer FX(tm)-8120, 32GB ram), because it leaves the desktop usable for other stuff while some build is running on background. With 4 BB_NUMBER_THREADS, big build and a bit of bad luck I was getting 4 do_compile tasks like qtbase, chromium, firefox and gimp at the same time which either makes me drink too much coffee or even invites uncle OOMK. I quite like the idea behind BB_NUMBER_THREADS_LOW_CPU. -- Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 201 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 1:11 ` Martin Jansa @ 2018-08-14 6:32 ` Mikko.Rapeli 2018-08-14 7:57 ` Andreas Müller 2018-08-14 8:07 ` Alexander Kanavin 0 siblings, 2 replies; 25+ messages in thread From: Mikko.Rapeli @ 2018-08-14 6:32 UTC (permalink / raw) To: martin.jansa; +Cc: bitbake-devel Just my 2 € cents to the discussion: we had to limit number of threads because heavy C++ projects were using all of RAM and causing heavy swapping. Single g++ processes were eating up to 20 gigabytes of physical ram. It's not just the CPU which is the limiting factor to parallel task execution. -Mikko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 6:32 ` Mikko.Rapeli @ 2018-08-14 7:57 ` Andreas Müller 2018-08-14 8:07 ` Alexander Kanavin 1 sibling, 0 replies; 25+ messages in thread From: Andreas Müller @ 2018-08-14 7:57 UTC (permalink / raw) To: Mikko.Rapeli; +Cc: bitbake-devel On Tue, Aug 14, 2018 at 8:32 AM, <Mikko.Rapeli@bmw.de> wrote: > Just my 2 € cents to the discussion: > > we had to limit number of threads because heavy C++ projects were using > all of RAM and causing heavy swapping. Single g++ processes were eating > up to 20 gigabytes of physical ram. It's not just the CPU which is the > limiting factor to parallel task execution. > LOL: My first approach was to extra-spawn in case memory occupied is less than 50%. That solution performed really bad. I think adding tasks on a CPU performing already with ~100% just cause overhead reducing overall performance. I think the target for best performance is running CPU with 100% with as few tasks as possible. Again: Major problem with this solution is that a low workload caused by CPU waiting for hard-disk or resuming from spawns additional tasks. That is totally wrong. Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 6:32 ` Mikko.Rapeli 2018-08-14 7:57 ` Andreas Müller @ 2018-08-14 8:07 ` Alexander Kanavin 2018-08-14 9:43 ` Richard Purdie 1 sibling, 1 reply; 25+ messages in thread From: Alexander Kanavin @ 2018-08-14 8:07 UTC (permalink / raw) To: Mikko.Rapeli; +Cc: bitbake-devel 2018-08-14 8:32 GMT+02:00 <Mikko.Rapeli@bmw.de>: > Just my 2 € cents to the discussion: > > we had to limit number of threads because heavy C++ projects were using > all of RAM and causing heavy swapping. Single g++ processes were eating > up to 20 gigabytes of physical ram. It's not just the CPU which is the > limiting factor to parallel task execution. I do believe some kind of clever dynamic limiter would be useful here. Obviously it's an absurd situation when on a 32 core processor there are 32 do_compile c++ tasks, each running 32 instances of g++ - which is the default configuration. On the other hand running things like do_configure or do_install should happen in parallel. I like Andre's idea, but it should also watch the available RAM. Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 8:07 ` Alexander Kanavin @ 2018-08-14 9:43 ` Richard Purdie 2018-08-14 9:45 ` Richard Purdie 2018-08-14 11:03 ` Alexander Kanavin 0 siblings, 2 replies; 25+ messages in thread From: Richard Purdie @ 2018-08-14 9:43 UTC (permalink / raw) To: Alexander Kanavin, Mikko.Rapeli, Andreas Müller, Martin Jansa Cc: bitbake-devel On Tue, 2018-08-14 at 10:07 +0200, Alexander Kanavin wrote: > 2018-08-14 8:32 GMT+02:00 <Mikko.Rapeli@bmw.de>: > > Just my 2 € cents to the discussion: > > > > we had to limit number of threads because heavy C++ projects were > > using > > all of RAM and causing heavy swapping. Single g++ processes were > > eating > > up to 20 gigabytes of physical ram. It's not just the CPU which is > > the > > limiting factor to parallel task execution. > > I do believe some kind of clever dynamic limiter would be useful > here. > Obviously it's an absurd situation when on a 32 core processor there > are 32 do_compile c++ tasks, each running 32 instances of g++ - which > is the default configuration. On the other hand running things like > do_configure or do_install should happen in parallel. I like Andre's > idea, but it should also watch the available RAM. If people want to play, I did experiment with "proper" thread pooling: http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237 This implements a make job server within bitbake, then connects make to it. The net result is that you can then put a limit on the number of processes across all tasks. I seem to remember there were some bugs with it and not all are listed in the commit message, I don't remember what the other issues were... Bonus marks for connecting in the other parallel "pool" tasks we have in do_package_* and friends but even a common pool for compile would be nice! Cheers, Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 9:43 ` Richard Purdie @ 2018-08-14 9:45 ` Richard Purdie 2018-08-14 10:28 ` Peter Kjellerstedt 2018-08-14 11:03 ` Alexander Kanavin 1 sibling, 1 reply; 25+ messages in thread From: Richard Purdie @ 2018-08-14 9:45 UTC (permalink / raw) To: Alexander Kanavin, Mikko.Rapeli, Andreas Müller, Martin Jansa Cc: bitbake-devel On Tue, 2018-08-14 at 10:43 +0100, Richard Purdie wrote: > On Tue, 2018-08-14 at 10:07 +0200, Alexander Kanavin wrote: > > 2018-08-14 8:32 GMT+02:00 <Mikko.Rapeli@bmw.de>: > > > Just my 2 € cents to the discussion: > > > > > > we had to limit number of threads because heavy C++ projects were > > > using > > > all of RAM and causing heavy swapping. Single g++ processes were > > > eating > > > up to 20 gigabytes of physical ram. It's not just the CPU which > > > is > > > the > > > limiting factor to parallel task execution. > > > > I do believe some kind of clever dynamic limiter would be useful > > here. > > Obviously it's an absurd situation when on a 32 core processor > > there > > are 32 do_compile c++ tasks, each running 32 instances of g++ - > > which > > is the default configuration. On the other hand running things like > > do_configure or do_install should happen in parallel. I like > > Andre's > > idea, but it should also watch the available RAM. > > If people want to play, I did experiment with "proper" thread > pooling: > > http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/w > ipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237 More recent version of the patch: http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue7&id=236ca8be128ba7a4edb9fb2c9e512d181679eee8 Cheers, Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 9:45 ` Richard Purdie @ 2018-08-14 10:28 ` Peter Kjellerstedt 2018-08-14 10:43 ` Andreas Müller 2018-08-14 11:07 ` Alexander Kanavin 0 siblings, 2 replies; 25+ messages in thread From: Peter Kjellerstedt @ 2018-08-14 10:28 UTC (permalink / raw) To: Richard Purdie, Alexander Kanavin, Mikko.Rapeli@bmw.de, Andreas Müller, Martin Jansa Cc: bitbake-devel@lists.openembedded.org > -----Original Message----- > From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel- > bounces@lists.openembedded.org> On Behalf Of Richard Purdie > Sent: den 14 augusti 2018 11:45 > To: Alexander Kanavin <alex.kanavin@gmail.com>; Mikko.Rapeli@bmw.de; > Andreas Müller <schnitzeltony@gmail.com>; Martin Jansa > <martin.jansa@gmail.com> > Cc: bitbake-devel@lists.openembedded.org > Subject: Re: [bitbake-devel] [PATCH 2/2] runqueue: Introduce load > balanced task spawning > > On Tue, 2018-08-14 at 10:43 +0100, Richard Purdie wrote: > > On Tue, 2018-08-14 at 10:07 +0200, Alexander Kanavin wrote: > > > 2018-08-14 8:32 GMT+02:00 <Mikko.Rapeli@bmw.de>: > > > > Just my 2 € cents to the discussion: > > > > > > > > we had to limit number of threads because heavy C++ projects > > > > were using all of RAM and causing heavy swapping. Single g++ > > > > processes were eating up to 20 gigabytes of physical ram. > > > > It's not just the CPU which is the limiting factor to > > > > parallel task execution. > > > > > > I do believe some kind of clever dynamic limiter would be > > > useful here. Obviously it's an absurd situation when on a 32 > > > core processor there are 32 do_compile c++ tasks, each running > > > 32 instances of g++ - which is the default configuration. On > > > the other hand running things like do_configure or do_install > > > should happen in parallel. I like Andre's idea, but it should > > > also watch the available RAM. > > > > If people want to play, I did experiment with "proper" thread > > pooling: > > > > http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237 > > More recent version of the patch: > > http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue7&id=236ca8be128ba7a4edb9fb2c9e512d181679eee8 > > Cheers, > > Richard Even though make is the prevalent tool used to build code, we have to consider others such as cmake and meson... Any idea if their parallelism can be controlled in some similar way? //Peter ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 10:28 ` Peter Kjellerstedt @ 2018-08-14 10:43 ` Andreas Müller 2018-08-14 11:07 ` Alexander Kanavin 1 sibling, 0 replies; 25+ messages in thread From: Andreas Müller @ 2018-08-14 10:43 UTC (permalink / raw) To: Peter Kjellerstedt; +Cc: bitbake-devel@lists.openembedded.org On Tue, Aug 14, 2018 at 12:28 PM, Peter Kjellerstedt <peter.kjellerstedt@axis.com> wrote: >> -----Original Message----- >> From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel- >> bounces@lists.openembedded.org> On Behalf Of Richard Purdie >> Sent: den 14 augusti 2018 11:45 >> To: Alexander Kanavin <alex.kanavin@gmail.com>; Mikko.Rapeli@bmw.de; >> Andreas Müller <schnitzeltony@gmail.com>; Martin Jansa >> <martin.jansa@gmail.com> >> Cc: bitbake-devel@lists.openembedded.org >> Subject: Re: [bitbake-devel] [PATCH 2/2] runqueue: Introduce load >> balanced task spawning >> >> On Tue, 2018-08-14 at 10:43 +0100, Richard Purdie wrote: >> > On Tue, 2018-08-14 at 10:07 +0200, Alexander Kanavin wrote: >> > > 2018-08-14 8:32 GMT+02:00 <Mikko.Rapeli@bmw.de>: >> > > > Just my 2 € cents to the discussion: >> > > > >> > > > we had to limit number of threads because heavy C++ projects >> > > > were using all of RAM and causing heavy swapping. Single g++ >> > > > processes were eating up to 20 gigabytes of physical ram. >> > > > It's not just the CPU which is the limiting factor to >> > > > parallel task execution. >> > > >> > > I do believe some kind of clever dynamic limiter would be >> > > useful here. Obviously it's an absurd situation when on a 32 >> > > core processor there are 32 do_compile c++ tasks, each running >> > > 32 instances of g++ - which is the default configuration. On >> > > the other hand running things like do_configure or do_install >> > > should happen in parallel. I like Andre's idea, but it should >> > > also watch the available RAM. >> > >> > If people want to play, I did experiment with "proper" thread >> > pooling: >> > >> > http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237 >> >> More recent version of the patch: >> >> http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue7&id=236ca8be128ba7a4edb9fb2c9e512d181679eee8 >> >> Cheers, >> >> Richard > > Even though make is the prevalent tool used to build code, we have > to consider others such as cmake and meson... Any idea if their > parallelism can be controlled in some similar way? > BTW: do_package_ipk is also a CPU eater these days although these task do not last as long as compile. Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 10:28 ` Peter Kjellerstedt 2018-08-14 10:43 ` Andreas Müller @ 2018-08-14 11:07 ` Alexander Kanavin 1 sibling, 0 replies; 25+ messages in thread From: Alexander Kanavin @ 2018-08-14 11:07 UTC (permalink / raw) To: Peter Kjellerstedt; +Cc: bitbake-devel@lists.openembedded.org 2018-08-14 12:28 GMT+02:00 Peter Kjellerstedt <peter.kjellerstedt@axis.com>: > Even though make is the prevalent tool used to build code, we have > to consider others such as cmake and meson... Any idea if their > parallelism can be controlled in some similar way? These tools (and qmake etc.) only configure the builds, with makefiles as output. They delegate the actual build job execution to make and/or ninja. Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 9:43 ` Richard Purdie 2018-08-14 9:45 ` Richard Purdie @ 2018-08-14 11:03 ` Alexander Kanavin 2018-08-15 12:43 ` Alexander Kanavin 1 sibling, 1 reply; 25+ messages in thread From: Alexander Kanavin @ 2018-08-14 11:03 UTC (permalink / raw) To: Richard Purdie; +Cc: bitbake-devel 2018-08-14 11:43 GMT+02:00 Richard Purdie <richard.purdie@linuxfoundation.org>: > If people want to play, I did experiment with "proper" thread pooling: > > http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237 > > This implements a make job server within bitbake, then connects make to > it. The net result is that you can then put a limit on the number of > processes across all tasks. > > I seem to remember there were some bugs with it and not all are listed > in the commit message, I don't remember what the other issues were... Both make and ninja have -l option: -l [load], --load-average[=load] Specifies that no new jobs (commands) should be started if there are others jobs running and the load average is at least load (a floating-point number). With no argument, removes a previous load limit. -l N do not start new jobs if the load average is greater than N Maybe that could be appended to PARALLEL_MAKE, which is far less invasive than any other approach? Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-14 11:03 ` Alexander Kanavin @ 2018-08-15 12:43 ` Alexander Kanavin 2018-08-15 15:01 ` Andreas Müller 0 siblings, 1 reply; 25+ messages in thread From: Alexander Kanavin @ 2018-08-15 12:43 UTC (permalink / raw) To: Richard Purdie; +Cc: bitbake-devel 2018-08-14 13:03 GMT+02:00 Alexander Kanavin <alex.kanavin@gmail.com>: > Both make and ninja have -l option: > > -l [load], --load-average[=load] > Specifies that no new jobs (commands) should be > started if there are others jobs running and the load average is at > least load (a floating-point number). With no argument, removes a > previous load limit. > > -l N do not start new jobs if the load average is greater than N > > Maybe that could be appended to PARALLEL_MAKE, which is far less > invasive than any other approach? I've done some tests, and yes -l does help. We currently have a nasty quadratic growth rate with cpu cores as input, which hits especially badly when the amount of cores is high, and a lot of long, heavy (e.g. c++) do_compile tasks run at once. This potentially means n*n compiler instances, where n is how many cpu cores are available. '-l' option does neatly limit that to a constant amount of compilers per core. However, this does not solve the other resource problem: running out of available RAM and going into swap thrashing. Neither make nor ninja can currently watch the RAM, even though it is not complicated: >>> import psutil >>> psutil.virtual_memory() svmem(total=16536903680, available=7968600064, percent=51.8, used=16347615232, free=189288448, active=11750494208, inactive=2882383872, buffers=3158528000, cached=4620783616) I think we should teach both to do that, and then replace a static 'number of jobs' limit in PARALLEL_MAKE with limits on CPU load and RAM usage. Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-15 12:43 ` Alexander Kanavin @ 2018-08-15 15:01 ` Andreas Müller 2018-08-15 16:26 ` Alexander Kanavin 0 siblings, 1 reply; 25+ messages in thread From: Andreas Müller @ 2018-08-15 15:01 UTC (permalink / raw) To: Alexander Kanavin; +Cc: bitbake-devel On Wed, Aug 15, 2018 at 2:43 PM, Alexander Kanavin <alex.kanavin@gmail.com> wrote: > 2018-08-14 13:03 GMT+02:00 Alexander Kanavin <alex.kanavin@gmail.com>: >> Both make and ninja have -l option: >> >> -l [load], --load-average[=load] >> Specifies that no new jobs (commands) should be >> started if there are others jobs running and the load average is at >> least load (a floating-point number). With no argument, removes a >> previous load limit. >> >> -l N do not start new jobs if the load average is greater than N >> >> Maybe that could be appended to PARALLEL_MAKE, which is far less >> invasive than any other approach? > > I've done some tests, and yes -l does help. We currently have a nasty > quadratic growth rate with cpu cores as input, which hits especially > badly when the amount of cores is high, and a lot of long, heavy (e.g. > c++) do_compile tasks run at once. This potentially means n*n compiler > instances, where n is how many cpu cores are available. '-l' option > does neatly limit that to a constant amount of compilers per core. > > However, this does not solve the other resource problem: running out > of available RAM and going into swap thrashing. Neither make nor ninja > can currently watch the RAM, even though it is not complicated: >>>> import psutil >>>> psutil.virtual_memory() > svmem(total=16536903680, available=7968600064, percent=51.8, > used=16347615232, free=189288448, active=11750494208, > inactive=2882383872, buffers=3158528000, cached=4620783616) > > I think we should teach both to do that, and then replace a static > 'number of jobs' limit in PARALLEL_MAKE with limits on CPU load and > RAM usage. > > Alex 1. I like -l - have to try either!! 2. Quadratic explosion: I think it you should reduce number of parallel bitbake threads. From what I have seen: When running heavy compiles with -j = number-of-cores it takes only 2-3 do_compiles to have CPU load at constant ~100%. I think this should be independent of number of cores. Try 4-5 parallel bitbake threads - I bet that speeds up your builds and reduces swap floods. Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] runqueue: Introduce load balanced task spawning 2018-08-15 15:01 ` Andreas Müller @ 2018-08-15 16:26 ` Alexander Kanavin 0 siblings, 0 replies; 25+ messages in thread From: Alexander Kanavin @ 2018-08-15 16:26 UTC (permalink / raw) To: Andreas Müller; +Cc: bitbake-devel 2018-08-15 17:01 GMT+02:00 Andreas Müller <schnitzeltony@gmail.com>: > 2. Quadratic explosion: I think it you should reduce number of > parallel bitbake threads. From what I have seen: When running heavy > compiles with -j = number-of-cores it takes only 2-3 do_compiles to > have CPU load at constant ~100%. I think this should be independent of > number of cores. Try 4-5 parallel bitbake threads - I bet that speeds > up your builds and reduces swap floods. What about the situations where bitbake is mostly busy with other things than do_compile, or when do_compile takes only a small fraction of the recipes build time? Particularly do_configure can be notoriously slow and single-threaded, so I do want to run those with all available cores. Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2018-08-15 16:26 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-13 21:04 [PATCH 0/2] Spawn extra tasks at low CPU worload Andreas Müller 2018-08-13 21:04 ` [PATCH 1/2] runqueue: Move decision if a task can be started to one common place Andreas Müller 2018-08-15 8:37 ` Richard Purdie 2018-08-15 8:43 ` Andreas Müller 2018-08-13 21:04 ` [PATCH 2/2] runqueue: Introduce load balanced task spawning Andreas Müller 2018-08-13 21:20 ` Alexander Kanavin 2018-08-13 21:30 ` Andreas Müller 2018-08-13 22:37 ` Andre McCurdy 2018-08-14 0:11 ` Andreas Müller 2018-08-14 1:01 ` Andre McCurdy 2018-08-14 9:05 ` Andreas Müller 2018-08-14 9:18 ` Alexander Kanavin 2018-08-14 1:11 ` Martin Jansa 2018-08-14 6:32 ` Mikko.Rapeli 2018-08-14 7:57 ` Andreas Müller 2018-08-14 8:07 ` Alexander Kanavin 2018-08-14 9:43 ` Richard Purdie 2018-08-14 9:45 ` Richard Purdie 2018-08-14 10:28 ` Peter Kjellerstedt 2018-08-14 10:43 ` Andreas Müller 2018-08-14 11:07 ` Alexander Kanavin 2018-08-14 11:03 ` Alexander Kanavin 2018-08-15 12:43 ` Alexander Kanavin 2018-08-15 15:01 ` Andreas Müller 2018-08-15 16:26 ` Alexander Kanavin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.