* request for comments - dynamic task selection
@ 2014-02-20 19:17 Damian, Alexandru
2014-02-22 11:22 ` Richard Purdie
0 siblings, 1 reply; 2+ messages in thread
From: Damian, Alexandru @ 2014-02-20 19:17 UTC (permalink / raw)
To: bitbake-devel
[-- Attachment #1.1: Type: text/plain, Size: 1250 bytes --]
Hello,
I have a RFC patch that attempts to improve build performance by
dynamically selecting the next-to-run task based on currently running
tasks. The general idea is if we overload the network with too many fetch
tasks, it's better to start doing unpack tasks while the disks idle instead
of starting new fetch tasks. The concept can be generally applied, e.g.
it's better to schedule package tasks in parallel with compile tasks
instead of running all the compile tasks first and then all package tasks.
This patch just looks at currently running tasks and selects the next
available task that hasn't a similar-type task already running.
I'm seeing a roughly 2% build time reduction when building the
virtual:native components, with just 26 tasks reordered out of 244 tasks
executed. I am attaching a toaster.sqlite file that captures the test -
build no.1 is done with the origin/master source, and build no. 2 is done
with this patch applied.
DO NOT MERGE - this patch exposes dependency problems in OE-Core,
specifically GCC 4.8 recipes. It breaks the OE-Core builds.
I'm waiting for your feedback. Meanwhile, I'm gonna try to solve the GCC
problems exposed.
--
Alex Damian
Yocto Project
SSG / OTC
[-- Attachment #1.2: Type: text/html, Size: 2026 bytes --]
[-- Attachment #2: 0001-bitbake-runqueue-improve-task-selection-algorithm.patch --]
[-- Type: text/x-patch, Size: 3159 bytes --]
From 51d008326c8a9c8c76e2f7f7d46f8dbc6b4996e7 Mon Sep 17 00:00:00 2001
From: Alexandru DAMIAN <alexandru.damian@intel.com>
Date: Tue, 18 Feb 2014 17:43:43 +0000
Subject: [PATCH 1/1] bitbake: runqueue: improve task selection algorithm
When selecting the next task to run, Bitbake uses a
priority map to select one task from the list of currently
buildable tasks. The priority map is statically decided before
starting the run through the use of the scheduler.
This patch changes the task selection to add dynamic selection
based on currently running tasks. The basic assumption is that
we get a lower total build time if we select tasks as to
allow a better spread of load across subsystems, ie. select
disk-intensive tasks instead of network-intensive tasks if there
are too many network tasks currently running.
First implementation simply looks at currently running tasks
and favors selecting a new task based on not having another task
of the same type currently running.
Signed-off-by: Alexandru DAMIAN <alexandru.damian@intel.com>
---
bitbake/lib/bb/runqueue.py | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 413d59f..290320b 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -135,16 +135,33 @@ class RunQueueScheduler(object):
for taskid in xrange(self.numTasks):
self.rev_prio_map[self.prio_map[taskid]] = taskid
- best = None
- bestprio = None
- for taskid in self.buildable:
- prio = self.rev_prio_map[taskid]
- if not bestprio or bestprio > prio:
- stamp = self.stamps[taskid]
- if stamp in self.rq.build_stamps.itervalues():
- continue
- bestprio = prio
- best = taskid
+ runningtasks = []
+ for taskid in xrange(self.numTasks):
+ if self.rq.runq_running[taskid] == 1 and self.rq.runq_complete[taskid] == 0:
+ runningtasks.append(taskid)
+
+ if (len(runningtasks)):
+ runningtasknames = {}
+ for taskid in runningtasks:
+ taskname = self.rqdata.runq_task[taskid]
+ if taskname in runningtasknames.keys():
+ runningtasknames[taskname] += 1
+ else:
+ runningtasknames[taskname] = 1
+
+ # we select a task that isn't in the list of currently running task names
+ b = sorted(self.buildable, key=lambda x: self.rev_prio_map[x])
+ for taskid in b:
+ taskname = self.rqdata.runq_task[taskid]
+ if not taskname in runningtasknames:
+ stamp = self.stamps[taskid]
+ if stamp in self.rq.build_stamps.itervalues():
+ continue
+ #bb.note("Selected task %s" % self.rqdata.runq_task[taskid])
+ return taskid
+
+ b = sorted(self.buildable, key=lambda x: self.rev_prio_map[x])
+ return b[0]
return best
--
1.8.3.2
[-- Attachment #3: toaster.sqlite.xz --]
[-- Type: application/x-xz, Size: 178344 bytes --]
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: request for comments - dynamic task selection
2014-02-20 19:17 request for comments - dynamic task selection Damian, Alexandru
@ 2014-02-22 11:22 ` Richard Purdie
0 siblings, 0 replies; 2+ messages in thread
From: Richard Purdie @ 2014-02-22 11:22 UTC (permalink / raw)
To: Damian, Alexandru; +Cc: bitbake-devel
On Thu, 2014-02-20 at 19:17 +0000, Damian, Alexandru wrote:
> I have a RFC patch that attempts to improve build performance by
> dynamically selecting the next-to-run task based on currently running
> tasks. The general idea is if we overload the network with too many
> fetch tasks, it's better to start doing unpack tasks while the disks
> idle instead of starting new fetch tasks. The concept can be generally
> applied, e.g. it's better to schedule package tasks in parallel with
> compile tasks instead of running all the compile tasks first and then
> all package tasks.
>
> This patch just looks at currently running tasks and selects the next
> available task that hasn't a similar-type task already running.
>
> I'm seeing a roughly 2% build time reduction when building the
> virtual:native components, with just 26 tasks reordered out of 244
> tasks executed. I am attaching a toaster.sqlite file that captures the
> test - build no.1 is done with the origin/master source, and build no.
> 2 is done with this patch applied.
The trouble with scheduler changes is that they're hard to evaluate as
the benefits may appear on one machine but cause issues on another.
Firstly, I'd like to see this as a new scheduler, not changing the
existing one. I'd then like Stefan to try this against our standard
performance tests, see what it does to our various benchmarks there.
I'd also note that do_fetch is not the best task for optimisation since
most people have relatively up to date source directories locally at any
point after their first build.
Cheers,
Richard
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-02-22 11:23 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 19:17 request for comments - dynamic task selection Damian, Alexandru
2014-02-22 11:22 ` Richard Purdie
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.