[PATCH 0/3] rm_work enhancements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] rm_work enhancements
@ 2017-01-06  9:50 Patrick Ohly
  2017-01-06  9:50 ` [PATCH 1/3] recipes: anonymous functions with priorities Patrick Ohly
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Patrick Ohly @ 2017-01-06  9:50 UTC (permalink / raw)
  To: bitbake-devel

This is the bitbake side of the rm_work.bbclass enhancements. See the
OE-core "rm_work + pybootchart enhancements" mail thread for further
information.

Related-to: YOCTO #10584

The following changes since commit ae6045b84978940c365c95c33d6996359c3e299d:

  bb/cooker: BBCooker stops notifier at shutdown (2017-01-06 00:01:03 +0000)

are available in the git repository at:

  git://github.com/pohly/bitbake rmwork
  https://github.com/pohly/bitbake/tree/rmwork

Patrick Ohly (3):
  recipes: anonymous functions with priorities
  build.py: add preceedtask() API
  runqueue.py: alternative rm_work scheduler

 lib/bb/build.py      |  16 ++++++++
 lib/bb/data_smart.py |   3 ++
 lib/bb/parse/ast.py  |  19 +++++++--
 lib/bb/runqueue.py   | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 142 insertions(+), 3 deletions(-)

-- 
2.1.4



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] recipes: anonymous functions with priorities
  2017-01-06  9:50 [PATCH 0/3] rm_work enhancements Patrick Ohly
@ 2017-01-06  9:50 ` Patrick Ohly
  2017-01-09 11:03   ` Patrick Ohly
  2017-01-06  9:50 ` [PATCH 2/3] build.py: add preceedtask() API Patrick Ohly
  2017-01-06  9:50 ` [PATCH 3/3] runqueue.py: alternative rm_work scheduler Patrick Ohly
  2 siblings, 1 reply; 5+ messages in thread
From: Patrick Ohly @ 2017-01-06  9:50 UTC (permalink / raw)
  To: bitbake-devel

The execution order of anonymous functions inside a recipe is
deterministic (executed in the order in which the functions get
parsed), but class authors cannot know in which order classes get
inherited and thus cannot know whether the class functions run before
or after functions in other classes. That is a problem for
rm_work.bbclass, which must add its own task after all other
classes have added theirs.

As an extension of the current syntax, now any function whose name
*starts* with "__anonymous" is considered an anonymous function;
previously, anything with *exactly* that name was an anonymous
function.

In contrast to those which are named just "__anonymous" or have no
name at all, the "__anonymous_<something>" name must be globally
unique and can be used to set a "__anonprio" variable flag for the
function.

The default priority is 100. Functions with higher values run later.

For example, rm_work.bbclass can use this to ensure that it runs
later than other anonymous functions:

python __anonymous_rm_work() {
    bb.build.addtask(....)
}
__anonymous_rm_work[priority] = "1000"

The priorities influence parsing results and thus need to be included
in the base hash. That part was necessary to trigger re-parsing when
modifying the priority in rm_work.bbclass, which gets inherited
globally via INHERIT.

Related-to: [YOCTO #10584]

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
 lib/bb/data_smart.py |  3 +++
 lib/bb/parse/ast.py  | 19 ++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
index 79d591a..4b1e9fb 100644
--- a/lib/bb/data_smart.py
+++ b/lib/bb/data_smart.py
@@ -1002,6 +1002,9 @@ class DataSmart(MutableMapping):
                 for i in bb_list:
                     value = d.getVar(i, False) or ""
                     data.update({i:value})
+                    priority = d.getVarFlag(i, '__anonprio', False)
+                    if priority is not None:
+                        data.update({'%s[__anonprio]' % i: priority})
 
         data_str = str([(k, data[k]) for k in sorted(data.keys())])
         return hashlib.md5(data_str.encode("utf-8")).hexdigest()
diff --git a/lib/bb/parse/ast.py b/lib/bb/parse/ast.py
index 853dda8..6255cc4 100644
--- a/lib/bb/parse/ast.py
+++ b/lib/bb/parse/ast.py
@@ -175,8 +175,15 @@ class MethodNode(AstNode):
     def eval(self, data):
         text = '\n'.join(self.body)
         funcname = self.func_name
-        if self.func_name == "__anonymous":
-            funcname = ("__anon_%s_%s" % (self.lineno, self.filename.translate(MethodNode.tr_tbl)))
+        if self.func_name.startswith("__anonymous"):
+            if self.func_name == "__anonymous":
+                # If truly anonymous, then we have to create a unique name.
+                funcname = ("__anon_%s_%s" % (self.lineno, self.filename.translate(MethodNode.tr_tbl)))
+            else:
+                # The recipe already has a unique name. Let's use that to ensure
+                # that any flags set for that name (like __anonymous_my_func[priority] = "1000")
+                # can be found later.
+                funcname = self.func_name
             self.python = True
             text = "def %s(d):\n" % (funcname) + text
             bb.methodpool.insert_method(funcname, text, self.filename, self.lineno - len(self.body))
@@ -352,7 +359,13 @@ def finalize(fn, d, variant = None):
     bb.data.expandKeys(d)
     bb.data.update_data(d)
     code = []
-    for funcname in d.getVar("__BBANONFUNCS", False) or []:
+    # Anonymous functions with lower priority get executed first.
+    # sorted() is stable, so for entries with the same priority,
+    # execution is in the order of definition.
+    anonymous = sorted(d.getVar("__BBANONFUNCS", False) or [],
+                       key=lambda f: int(d.getVarFlag(f, '__anonprio', True) or 100))
+    d.setVar("__BBANONFUNCS", anonymous)
+    for funcname in anonymous:
         code.append("%s(d)" % funcname)
     bb.utils.better_exec("\n".join(code), {"d": d})
     bb.data.update_data(d)
-- 
2.1.4



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] recipes: anonymous functions with priorities
  2017-01-06  9:50 ` [PATCH 1/3] recipes: anonymous functions with priorities Patrick Ohly
@ 2017-01-09 11:03   ` Patrick Ohly
  0 siblings, 0 replies; 5+ messages in thread
From: Patrick Ohly @ 2017-01-09 11:03 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Richard Purdie

On Fri, 2017-01-06 at 10:50 +0100, Patrick Ohly wrote:
> The default priority is 100. Functions with higher values run later.
> 
> For example, rm_work.bbclass can use this to ensure that it runs
> later than other anonymous functions:
> 
> python __anonymous_rm_work() {
>     bb.build.addtask(....)
> }
> __anonymous_rm_work[priority] = "1000"

We discussed this a bit on chat, and there were concerns that priorities
are not defined well enough and will cause maintenance problems. Let me
add here that this was partly intentional: bitbake doesn't need to care
about specific priorities, that's something that the users of bitbake
(like OE-core) need to define. Admittedly my OE-core patch set did not
address that either, although it could be added with a documentation
patch.

As alternative, Richard suggested to add more events that will emit at
well-defined points in the processing of a recipe. However, as he said
himself, that then leads to the problem of ordering event handlers once
there is more then one handler who wants to run for a certain event.
IMHO that just reduces the scope of the problem, but does not really
solve it.

Basically, there would have to be a dedicated event for use only by
rm_work and nothing else, which kind of breaks the separation between
bitbake mechanisms and their use in meta data. Each time meta data needs
a new separate "slot", bitbake would have to be extended, which isn't
the case with the priority approach where each priority value already is
a separate slot.

Having said that, I don't have any strong opinion about this, I just
wanted to share some more thoughts on this.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/3] build.py: add preceedtask() API
  2017-01-06  9:50 [PATCH 0/3] rm_work enhancements Patrick Ohly
  2017-01-06  9:50 ` [PATCH 1/3] recipes: anonymous functions with priorities Patrick Ohly
@ 2017-01-06  9:50 ` Patrick Ohly
  2017-01-06  9:50 ` [PATCH 3/3] runqueue.py: alternative rm_work scheduler Patrick Ohly
  2 siblings, 0 replies; 5+ messages in thread
From: Patrick Ohly @ 2017-01-06  9:50 UTC (permalink / raw)
  To: bitbake-devel

The API is required by the revised rm_work.bbclass implementation,
which needs to know all tasks that do_build depends so that it
can properly inject itself between do_build and those tasks.

The new API primarily hides the internal implementation of the "after"
and "before" dependency tracking. Because tasks defined as
precondition via "recrdeptask" may or may not be relevant (they are for
rm_work.bclass), the API also includes support for that.

There's no default value for including recrdeptasks, so developers
have to think about what they need.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
 lib/bb/build.py | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/bb/build.py b/lib/bb/build.py
index 271cda6..c6104a4 100644
--- a/lib/bb/build.py
+++ b/lib/bb/build.py
@@ -862,3 +862,19 @@ def deltask(task, d):
         if task in deps:
             deps.remove(task)
             d.setVarFlag(bbtask, 'deps', deps)
+
+def preceedtask(task, with_recrdeptasks, d):
+    """
+    Returns a set of tasks in the current recipe which were specified as
+    precondition by the task itself ("after") or which listed themselves
+    as precondition ("before"). Preceeding tasks specified via the
+    "recrdeptask" are included in the result only if requested. Beware
+    that this may lead to the task itself being listed.
+    """
+    preceed = set()
+    preceed.update(d.getVarFlag(task, 'deps') or [])
+    if with_recrdeptasks:
+        recrdeptask = d.getVarFlag(task, 'recrdeptask')
+        if recrdeptask:
+            preceed.update(recrdeptask.split())
+    return preceed
-- 
2.1.4



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] runqueue.py: alternative rm_work scheduler
  2017-01-06  9:50 [PATCH 0/3] rm_work enhancements Patrick Ohly
  2017-01-06  9:50 ` [PATCH 1/3] recipes: anonymous functions with priorities Patrick Ohly
  2017-01-06  9:50 ` [PATCH 2/3] build.py: add preceedtask() API Patrick Ohly
@ 2017-01-06  9:50 ` Patrick Ohly
  2 siblings, 0 replies; 5+ messages in thread
From: Patrick Ohly @ 2017-01-06  9:50 UTC (permalink / raw)
  To: bitbake-devel

The idea is that tasks which complete building a recipe (like
do_package_qa) are more important than tasks which start building new
recipes (do_fetch) or those which increase disk usage
(do_compile). Therefore tasks get ordered like this (most important
first, do_rm_work before do_build because the enhanced rm_work.bbclass
was used):

1. ID /work/poky/meta/recipes-support/popt/popt_1.16.bb:do_build
2. ID /work/poky/meta/recipes-core/readline/readline_6.3.bb:do_build
3. ID /work/poky/meta/recipes-connectivity/libnss-mdns/libnss-mdns_0.10.bb:do_build
...
464. ID /work/poky/meta/recipes-sato/images/core-image-sato.bb:do_build
465. ID /work/poky/meta/recipes-graphics/xorg-proto/inputproto_2.3.2.bb:do_rm_work
466. ID /work/poky/meta/recipes-devtools/python/python3_3.5.2.bb:do_rm_work
467. ID /work/poky/meta/recipes-core/packagegroups/packagegroup-base.bb:do_rm_work
...
3620. ID virtual:native:/work/poky/meta/recipes-extended/pbzip2/pbzip2_1.1.13.bb:do_install
3621. ID /work/poky/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb:do_install
3622. ID /work/poky/meta/recipes-core/zlib/zlib_1.2.8.bb:do_compile_ptest_base
3623. ID /work/poky/meta/recipes-extended/bzip2/bzip2_1.0.6.bb:do_compile_ptest_base
...
3645. ID /work/poky/meta/recipes-support/libevent/libevent_2.0.22.bb:do_compile_ptest_base
3646. ID /work/poky/meta/recipes-core/busybox/busybox_1.24.1.bb:do_compile_ptest_base
3647. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_uboot_mkimage
3648. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_sizecheck
3649. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_strip
3650. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_compile_kernelmodules
3651. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_shared_workdir
3652. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_kernel_link_images
3653. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_compile
3654. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_compile
...

The order of the same task between different recipes is the same as
with the speed scheduler, i.e. more important recipes come first.

The new scheduler has to be picked explicitly with
   BB_SCHEDULER = "rm_work"

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
 lib/bb/runqueue.py | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
index 48c6a79..bf5b65d 100644
--- a/lib/bb/runqueue.py
+++ b/lib/bb/runqueue.py
@@ -183,6 +183,18 @@ class RunQueueScheduler(object):
     def newbuilable(self, task):
         self.buildable.append(task)
 
+    def describe_task(self, taskid):
+        result = 'ID %s' % taskid
+        if self.rev_prio_map:
+            result = result + (' pri %d' % self.rev_prio_map[taskid])
+        return result
+
+    def dump_prio(self, comment):
+        bb.debug(3, '%s (most important first):\n%s' %
+                 (comment,
+                  '\n'.join(['%d. %s' % (index + 1, self.describe_task(taskid)) for
+                             index, taskid in enumerate(self.prio_map)])))
+
 class RunQueueSchedulerSpeed(RunQueueScheduler):
     """
     A scheduler optimised for speed. The priority map is sorted by task weight,
@@ -242,6 +254,101 @@ class RunQueueSchedulerCompletion(RunQueueSchedulerSpeed):
             for idx in todel:
                 del basemap[idx]
 
+class RunQueueSchedulerRmwork(RunQueueSchedulerSpeed):
+    """
+    A scheduler optimised to complete .bb files are quickly as possible. The
+    priority map is sorted by task weight, but then reordered so that once a given
+    .bb file starts to build, it's completed as quickly as possible by
+    running all tasks related to the same .bb file one after the after.
+    """
+    name = "rm_work"
+
+    def __init__(self, runqueue, rqdata):
+        super(RunQueueSchedulerRmwork, self).__init__(runqueue, rqdata)
+
+        # Extract list of tasks for each recipe, with tasks sorted
+        # ascending from "must run first" (typically do_fetch) to
+        # "runs last" (do_rm_work). Both the speed and completion
+        # schedule prioritize tasks that must run first before the ones
+        # that run later; this is what we depend on here.
+        task_lists = {}
+        for taskid in self.prio_map:
+            fn, taskname = taskid.rsplit(':', 1)
+            task_lists.setdefault(fn, []).append(taskname)
+
+        # Now unify the different task lists. The strategy is that
+        # common tasks get skipped and new ones get inserted after the
+        # preceeding common one(s) as they are found. Because task
+        # lists should differ only by their number of tasks, but not
+        # the ordering of the common tasks, this should result in a
+        # deterministic result that is a superset of the individual
+        # task ordering.
+        all_tasks = []
+        for recipe, new_tasks in task_lists.items():
+            index = 0
+            old_task = all_tasks[index] if index < len(all_tasks) else None
+            for new_task in new_tasks:
+                if old_task == new_task:
+                    # Common task, skip it. This is the fast-path which
+                    # avoids a full search.
+                    index += 1
+                    old_task = all_tasks[index] if index < len(all_tasks) else None
+                else:
+                    try:
+                        index = all_tasks.index(new_task)
+                        # Already present, just not at the current
+                        # place. We re-synchronized by changing the
+                        # index so that it matches again. Now
+                        # move on to the next existing task.
+                        index += 1
+                        old_task = all_tasks[index] if index < len(all_tasks) else None
+                    except ValueError:
+                        # Not present. Insert before old_task, which
+                        # remains the same (but gets shifted back).
+                        all_tasks.insert(index, new_task)
+                        index += 1
+        bb.debug(3, 'merged task list: %s'  % all_tasks)
+
+        # Now reverse the order so that tasks that finish the work on one
+        # recipe are considered more imporant (= come first). The ordering
+        # is now so that do_rm_work[_all] is most important.
+        all_tasks.reverse()
+
+        # Group tasks of the same kind before tasks of less important
+        # kinds at the head of the queue (because earlier = lower
+        # priority number = runs earlier), while preserving the
+        # ordering by recipe. If recipe foo is more important than
+        # bar, then the goal is to work on foo's do_populate_sysroot
+        # before bar's do_populate_sysroot and on the more important
+        # tasks of foo before any of the less important tasks in any
+        # other recipe (if those other recipes are more important than
+        # foo).
+        #
+        # All of this only applies when tasks are runable. Explicit
+        # dependencies still override this ordering by priority.
+        #
+        # Here's an example why this priority re-ordering helps with
+        # minimizing disk usage. Consider a recipe foo with a higher
+        # priority than bar where foo DEPENDS on bar. Then the
+        # implicit rule (from base.bbclass) is that foo's do_configure
+        # depends on bar's do_populate_sysroot. This ensures that
+        # bar's do_populate_sysroot gets done first. Normally the
+        # tasks from foo would continue to run once that is done, and
+        # bar only gets completed and cleaned up later. By ordering
+        # bar's task that depend on bar's do_populate_sysroot before foo's
+        # do_configure, that problem gets avoided.
+        task_index = 0
+        self.dump_prio('original priorities')
+        for task in all_tasks:
+            for index in range(task_index, self.numTasks):
+                taskid = self.prio_map[index]
+                taskname = taskid.rsplit(':', 1)[1]
+                if taskname == task:
+                    del self.prio_map[index]
+                    self.prio_map.insert(task_index, taskid)
+                    task_index += 1
+        self.dump_prio('rm_work priorities')
+
 class RunTaskEntry(object):
     def __init__(self):
         self.depends = set()
-- 
2.1.4



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-09 11:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-06  9:50 [PATCH 0/3] rm_work enhancements Patrick Ohly
2017-01-06  9:50 ` [PATCH 1/3] recipes: anonymous functions with priorities Patrick Ohly
2017-01-09 11:03   ` Patrick Ohly
2017-01-06  9:50 ` [PATCH 2/3] build.py: add preceedtask() API Patrick Ohly
2017-01-06  9:50 ` [PATCH 3/3] runqueue.py: alternative rm_work scheduler Patrick Ohly

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.