* [PATCH 1/1] bitbake: optimize file parsing speed
2010-11-17 4:49 [PATCH 0/1] [PULL] bitbake: optimize the file parsing speed Dongxiao Xu
@ 2010-11-17 4:10 ` Dongxiao Xu
2010-11-19 10:27 ` Richard Purdie
0 siblings, 1 reply; 5+ messages in thread
From: Dongxiao Xu @ 2010-11-17 4:10 UTC (permalink / raw)
To: poky
build some data cache for generate_dependencies() on hand, and later
each time when parsing the bb file, we do not need to build them again
and again.
This optimization could get about 50% speed gain when parsing all ~800
bb files.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
bitbake/lib/bb/cooker.py | 2 ++
bitbake/lib/bb/data.py | 11 ++++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
index 33eb65e..05e6c16 100644
--- a/bitbake/lib/bb/cooker.py
+++ b/bitbake/lib/bb/cooker.py
@@ -76,6 +76,8 @@ class BBCooker:
self.configuration.data = bb.data.init()
+ bb.data.init_data_cache(self.configuration.data)
+
if not server:
bb.data.setVar("BB_WORKERCONTEXT", "1", self.configuration.data)
diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py
index fee10cc..a9e539f 100644
--- a/bitbake/lib/bb/data.py
+++ b/bitbake/lib/bb/data.py
@@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps, d):
#bb.note("Variable %s references %s and calls %s" % (key, str(deps), str(execs)))
#d.setVarFlag(key, "vardeps", deps)
-def generate_dependencies(d):
+def init_data_cache(d):
+ bb.data.keylist = set(key for key in d.keys() if not key.startswith("__"))
+ bb.data.shelldeps = set(key for key in bb.data.keylist if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
- keys = set(key for key in d.keys() if not key.startswith("__"))
- shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
+def generate_dependencies(d):
deps = {}
taskdeps = {}
tasklist = bb.data.getVar('__BBTASKS', d) or []
for task in tasklist:
- deps[task] = build_dependencies(task, keys, shelldeps, d)
+ deps[task] = build_dependencies(task, bb.data.keylist, bb.data.shelldeps, d)
newdeps = deps[task]
seen = set()
@@ -316,7 +317,7 @@ def generate_dependencies(d):
newdeps = set()
for dep in nextdeps:
if dep not in deps:
- deps[dep] = build_dependencies(dep, keys, shelldeps, d)
+ deps[dep] = build_dependencies(dep, bb.data.keylist, bb.data.shelldeps, d)
newdeps |= deps[dep]
newdeps -= seen
taskdeps[task] = seen | newdeps
--
1.6.3.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 0/1] [PULL] bitbake: optimize the file parsing speed
@ 2010-11-17 4:49 Dongxiao Xu
2010-11-17 4:10 ` [PATCH 1/1] bitbake: optimize " Dongxiao Xu
0 siblings, 1 reply; 5+ messages in thread
From: Dongxiao Xu @ 2010-11-17 4:49 UTC (permalink / raw)
To: poky
Hi Richard and Saul,
This patch is to optimize the file parsing speed. Please help to review
and pull. Thanks!
From the profiling result, we found that generate_dependencies()
occupies a lot of time where the following code will be run when
parsing each bb file.
keys = set(key for key in d.keys() if not key.startswith("__"))
shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
To optimize the code, main thought is to build the data cache for
generate_dependencies() on hand, and later each time when parsing the
bb file, we do not need to build them again and again.
Here I used "bitbake -p -P" to measure the file parsing time.
Each time before testing, we will do "touch conf/local.conf" to ensure
that all the 846 files are re-parsed.
Here are the test results:
Before optimization:
Round 1: 74.980s (deprecated)
Round 2: 60.281s
Round 3: 59.824s
Round 4: 60.771s
--------------------
Average: 60.292s
After optimization:
Round 1: 45.003s (deprecated)
Round 2: 33.063s
Round 3: 32.991s
Round 4: 32.043s
--------------------
Average: 32.699s
For both cases, the first time result is a big higher than later, I think
it is due to cold cache, so here I deprecated it.
For the rest three rounds, I calculated the average time, and we can see
the optimization could gain ~50% parsing time.
Pull URL: git://git.pokylinux.org/poky-contrib.git
Branch: dxu4/perf
Browse: http://git.pokylinux.org/cgit.cgi/poky-contrib/log/?h=dxu4/perf
Thanks,
Dongxiao Xu <dongxiao.xu@intel.com>
---
Dongxiao Xu (1):
bitbake: optimize file parsing speed
bitbake/lib/bb/cooker.py | 2 ++
bitbake/lib/bb/data.py | 11 ++++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] bitbake: optimize file parsing speed
2010-11-17 4:10 ` [PATCH 1/1] bitbake: optimize " Dongxiao Xu
@ 2010-11-19 10:27 ` Richard Purdie
2010-11-19 13:14 ` Xu, Dongxiao
2010-11-26 8:07 ` Qing He
0 siblings, 2 replies; 5+ messages in thread
From: Richard Purdie @ 2010-11-19 10:27 UTC (permalink / raw)
To: Dongxiao Xu; +Cc: poky
On Wed, 2010-11-17 at 12:10 +0800, Dongxiao Xu wrote:
> build some data cache for generate_dependencies() on hand, and later
> each time when parsing the bb file, we do not need to build them again
> and again.
>
> This optimization could get about 50% speed gain when parsing all ~800
> bb files.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
> bitbake/lib/bb/cooker.py | 2 ++
> bitbake/lib/bb/data.py | 11 ++++++-----
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
> index 33eb65e..05e6c16 100644
> --- a/bitbake/lib/bb/cooker.py
> +++ b/bitbake/lib/bb/cooker.py
> @@ -76,6 +76,8 @@ class BBCooker:
>
> self.configuration.data = bb.data.init()
>
> + bb.data.init_data_cache(self.configuration.data)
> +
> if not server:
> bb.data.setVar("BB_WORKERCONTEXT", "1", self.configuration.data)
>
> diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py
> index fee10cc..a9e539f 100644
> --- a/bitbake/lib/bb/data.py
> +++ b/bitbake/lib/bb/data.py
> @@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps, d):
> #bb.note("Variable %s references %s and calls %s" % (key, str(deps), str(execs)))
> #d.setVarFlag(key, "vardeps", deps)
>
> -def generate_dependencies(d):
> +def init_data_cache(d):
> + bb.data.keylist = set(key for key in d.keys() if not key.startswith("__"))
> + bb.data.shelldeps = set(key for key in bb.data.keylist if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
>
> - keys = set(key for key in d.keys() if not key.startswith("__"))
> - shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
> +def generate_dependencies(d):
>
> deps = {}
> taskdeps = {}
>
> tasklist = bb.data.getVar('__BBTASKS', d) or []
> for task in tasklist:
> - deps[task] = build_dependencies(task, keys, shelldeps, d)
> + deps[task] = build_dependencies(task, bb.data.keylist, bb.data.shelldeps, d)
>
> newdeps = deps[task]
> seen = set()
> @@ -316,7 +317,7 @@ def generate_dependencies(d):
> newdeps = set()
> for dep in nextdeps:
> if dep not in deps:
> - deps[dep] = build_dependencies(dep, keys, shelldeps, d)
> + deps[dep] = build_dependencies(dep, bb.data.keylist, bb.data.shelldeps, d)
> newdeps |= deps[dep]
> newdeps -= seen
> taskdeps[task] = seen | newdeps
I'm afraid this isn't going to be quite this simple although this does
prove those lines of code are a big hotspot in parsing.
Why? You're creating the key and export lists for the base configuration
data whereas the original code creates these lists for the *total*
parsed metadata. There will therefore be differences in the values held
by the two caches :(.
As an example, if you set:
FOO = "bar" in a .bb file, 'FOO' will not appear in your keywords cache.
Cheers,
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] bitbake: optimize file parsing speed
2010-11-19 10:27 ` Richard Purdie
@ 2010-11-19 13:14 ` Xu, Dongxiao
2010-11-26 8:07 ` Qing He
1 sibling, 0 replies; 5+ messages in thread
From: Xu, Dongxiao @ 2010-11-19 13:14 UTC (permalink / raw)
To: Richard Purdie; +Cc: poky@yoctoproject.org
Richard Purdie wrote:
> On Wed, 2010-11-17 at 12:10 +0800, Dongxiao Xu wrote:
>> build some data cache for generate_dependencies() on hand, and later
>> each time when parsing the bb file, we do not need to build them
>> again
>> and again.
>>
>> This optimization could get about 50% speed gain when parsing all
>> ~800
>> bb files.
>>
>> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
>> ---
>> bitbake/lib/bb/cooker.py | 2 ++
>> bitbake/lib/bb/data.py | 11 ++++++-----
>> 2 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
>> index 33eb65e..05e6c16 100644 --- a/bitbake/lib/bb/cooker.py
>> +++ b/bitbake/lib/bb/cooker.py
>> @@ -76,6 +76,8 @@ class BBCooker:
>>
>> self.configuration.data = bb.data.init()
>>
>> + bb.data.init_data_cache(self.configuration.data) +
>> if not server:
>> bb.data.setVar("BB_WORKERCONTEXT", "1",
>> self.configuration.data)
>>
>> diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py index
>> fee10cc..a9e539f 100644
>> --- a/bitbake/lib/bb/data.py
>> +++ b/bitbake/lib/bb/data.py
>> @@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps,
>> d): #bb.note("Variable %s references %s and calls %s" % (key,
>> str(deps), str(execs))) #d.setVarFlag(key, "vardeps", deps)
>>
>> -def generate_dependencies(d):
>> +def init_data_cache(d):
>> + bb.data.keylist = set(key for key in d.keys() if not
>> key.startswith("__")) + bb.data.shelldeps = set(key for key in
>> bb.data.keylist if +d.getVarFlag(key, "export") and not
>> d.getVarFlag(key, "unexport"))
>>
>> - keys = set(key for key in d.keys() if not key.startswith("__"))
>> - shelldeps = set(key for key in keys if d.getVarFlag(key,
>> "export") and not d.getVarFlag(key, "unexport")) +def
>> generate_dependencies(d):
>>
>> deps = {}
>> taskdeps = {}
>>
>> tasklist = bb.data.getVar('__BBTASKS', d) or [] for task
>> in tasklist: - deps[task] = build_dependencies(task, keys,
>> shelldeps, d) + deps[task] = build_dependencies(task,
>> bb.data.keylist, + bb.data.shelldeps, d)
>>
>> newdeps = deps[task]
>> seen = set()
>> @@ -316,7 +317,7 @@ def generate_dependencies(d):
>> newdeps = set()
>> for dep in nextdeps:
>> if dep not in deps:
>> - deps[dep] = build_dependencies(dep, keys,
>> shelldeps, d) + deps[dep] =
>> build_dependencies(dep, + bb.data.keylist, bb.data.shelldeps, d)
>> newdeps |= deps[dep]
>> newdeps -= seen
>> taskdeps[task] = seen | newdeps
>
>
> I'm afraid this isn't going to be quite this simple although this
> does prove those lines of code are a big hotspot in parsing.
>
> Why? You're creating the key and export lists for the base
> configuration data whereas the original code creates these lists for
> the *total* parsed metadata. There will therefore be differences in
> the values held by the two caches :(.
>
> As an example, if you set:
>
> FOO = "bar" in a .bb file, 'FOO' will not appear in your keywords
> cache.
>
> Cheers,
>
> Richard
Richard,
Yes, you are right, thanks for pointing it out.
Now I am trying to solve this parse time issue in another way.
We saw that the following two lines cost a lot of cycles.
keys = set(key for key in d.keys() if not key.startswith("__"))
shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
After dump out the d.keys(), I found most of the items (>90%) are variables in distro_tracking_fields.inc, actually they are not used in normal build process.
I checked the code, some functions in utility-tasks.bbclass (related with upstream version check) will need information in distro_tracking_fields.inc, and so that poky.conf includes this file. And utility-tasks.bbclass is inherited in base.bbclass, which is somewhat fundamental to poky.
I am thinking of moving those distro checking related code from utility-tasks.bbclass to distrodata.bbclass, in order not to involve such big database (distro_tracking_fields.inc) in normal parsing process. Does it make sense?
I did some simple tests, and this could save about 20% file parsing time.
For the repeated parsing hot spot, I will continue to investigate to see whether there is optimization point.
Thanks,
Dongxiao
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] bitbake: optimize file parsing speed
2010-11-19 10:27 ` Richard Purdie
2010-11-19 13:14 ` Xu, Dongxiao
@ 2010-11-26 8:07 ` Qing He
1 sibling, 0 replies; 5+ messages in thread
From: Qing He @ 2010-11-26 8:07 UTC (permalink / raw)
To: Richard Purdie; +Cc: poky@yoctoproject.org
On Fri, 2010-11-19 at 18:27 +0800, Richard Purdie wrote:
> I'm afraid this isn't going to be quite this simple although this does
> prove those lines of code are a big hotspot in parsing.
>
> Why? You're creating the key and export lists for the base configuration
> data whereas the original code creates these lists for the *total*
> parsed metadata. There will therefore be differences in the values held
> by the two caches :(.
Is it possible to maintain a list during variable changes? say, tag
several method like setVar, setVarFlag, etc., add something like
d["_conanicalkeys"] and d["_exportkeys"] and when this info is required,
recursively get every "_conanicalkeys" from d and d["_data"].
It has like 13% speed improvement in my experiment, but is quite tricky
and ugly, possibly not worth the trouble.
Btw, looks like much of the performance downgrade of the parsing is due
to moving python/shell code parsing/compilation to parse time? That way,
the performance of parse is OK?
Thanks,
Qing
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-26 8:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-17 4:49 [PATCH 0/1] [PULL] bitbake: optimize the file parsing speed Dongxiao Xu
2010-11-17 4:10 ` [PATCH 1/1] bitbake: optimize " Dongxiao Xu
2010-11-19 10:27 ` Richard Purdie
2010-11-19 13:14 ` Xu, Dongxiao
2010-11-26 8:07 ` Qing He
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.