* [PATCH 0/1] [PULL] bitbake: optimize the file parsing speed
@ 2010-11-17 4:49 Dongxiao Xu
2010-11-17 4:10 ` [PATCH 1/1] bitbake: optimize " Dongxiao Xu
0 siblings, 1 reply; 5+ messages in thread
From: Dongxiao Xu @ 2010-11-17 4:49 UTC (permalink / raw)
To: poky
Hi Richard and Saul,
This patch is to optimize the file parsing speed. Please help to review
and pull. Thanks!
From the profiling result, we found that generate_dependencies()
occupies a lot of time where the following code will be run when
parsing each bb file.
keys = set(key for key in d.keys() if not key.startswith("__"))
shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))
To optimize the code, main thought is to build the data cache for
generate_dependencies() on hand, and later each time when parsing the
bb file, we do not need to build them again and again.
Here I used "bitbake -p -P" to measure the file parsing time.
Each time before testing, we will do "touch conf/local.conf" to ensure
that all the 846 files are re-parsed.
Here are the test results:
Before optimization:
Round 1: 74.980s (deprecated)
Round 2: 60.281s
Round 3: 59.824s
Round 4: 60.771s
--------------------
Average: 60.292s
After optimization:
Round 1: 45.003s (deprecated)
Round 2: 33.063s
Round 3: 32.991s
Round 4: 32.043s
--------------------
Average: 32.699s
For both cases, the first time result is a big higher than later, I think
it is due to cold cache, so here I deprecated it.
For the rest three rounds, I calculated the average time, and we can see
the optimization could gain ~50% parsing time.
Pull URL: git://git.pokylinux.org/poky-contrib.git
Branch: dxu4/perf
Browse: http://git.pokylinux.org/cgit.cgi/poky-contrib/log/?h=dxu4/perf
Thanks,
Dongxiao Xu <dongxiao.xu@intel.com>
---
Dongxiao Xu (1):
bitbake: optimize file parsing speed
bitbake/lib/bb/cooker.py | 2 ++
bitbake/lib/bb/data.py | 11 ++++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] bitbake: optimize file parsing speed 2010-11-17 4:49 [PATCH 0/1] [PULL] bitbake: optimize the file parsing speed Dongxiao Xu @ 2010-11-17 4:10 ` Dongxiao Xu 2010-11-19 10:27 ` Richard Purdie 0 siblings, 1 reply; 5+ messages in thread From: Dongxiao Xu @ 2010-11-17 4:10 UTC (permalink / raw) To: poky build some data cache for generate_dependencies() on hand, and later each time when parsing the bb file, we do not need to build them again and again. This optimization could get about 50% speed gain when parsing all ~800 bb files. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> --- bitbake/lib/bb/cooker.py | 2 ++ bitbake/lib/bb/data.py | 11 ++++++----- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py index 33eb65e..05e6c16 100644 --- a/bitbake/lib/bb/cooker.py +++ b/bitbake/lib/bb/cooker.py @@ -76,6 +76,8 @@ class BBCooker: self.configuration.data = bb.data.init() + bb.data.init_data_cache(self.configuration.data) + if not server: bb.data.setVar("BB_WORKERCONTEXT", "1", self.configuration.data) diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py index fee10cc..a9e539f 100644 --- a/bitbake/lib/bb/data.py +++ b/bitbake/lib/bb/data.py @@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps, d): #bb.note("Variable %s references %s and calls %s" % (key, str(deps), str(execs))) #d.setVarFlag(key, "vardeps", deps) -def generate_dependencies(d): +def init_data_cache(d): + bb.data.keylist = set(key for key in d.keys() if not key.startswith("__")) + bb.data.shelldeps = set(key for key in bb.data.keylist if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport")) - keys = set(key for key in d.keys() if not key.startswith("__")) - shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport")) +def generate_dependencies(d): deps = {} taskdeps = {} tasklist = bb.data.getVar('__BBTASKS', d) or [] for task in tasklist: - deps[task] = build_dependencies(task, keys, shelldeps, d) + deps[task] = build_dependencies(task, bb.data.keylist, bb.data.shelldeps, d) newdeps = deps[task] seen = set() @@ -316,7 +317,7 @@ def generate_dependencies(d): newdeps = set() for dep in nextdeps: if dep not in deps: - deps[dep] = build_dependencies(dep, keys, shelldeps, d) + deps[dep] = build_dependencies(dep, bb.data.keylist, bb.data.shelldeps, d) newdeps |= deps[dep] newdeps -= seen taskdeps[task] = seen | newdeps -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] bitbake: optimize file parsing speed 2010-11-17 4:10 ` [PATCH 1/1] bitbake: optimize " Dongxiao Xu @ 2010-11-19 10:27 ` Richard Purdie 2010-11-19 13:14 ` Xu, Dongxiao 2010-11-26 8:07 ` Qing He 0 siblings, 2 replies; 5+ messages in thread From: Richard Purdie @ 2010-11-19 10:27 UTC (permalink / raw) To: Dongxiao Xu; +Cc: poky On Wed, 2010-11-17 at 12:10 +0800, Dongxiao Xu wrote: > build some data cache for generate_dependencies() on hand, and later > each time when parsing the bb file, we do not need to build them again > and again. > > This optimization could get about 50% speed gain when parsing all ~800 > bb files. > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> > --- > bitbake/lib/bb/cooker.py | 2 ++ > bitbake/lib/bb/data.py | 11 ++++++----- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py > index 33eb65e..05e6c16 100644 > --- a/bitbake/lib/bb/cooker.py > +++ b/bitbake/lib/bb/cooker.py > @@ -76,6 +76,8 @@ class BBCooker: > > self.configuration.data = bb.data.init() > > + bb.data.init_data_cache(self.configuration.data) > + > if not server: > bb.data.setVar("BB_WORKERCONTEXT", "1", self.configuration.data) > > diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py > index fee10cc..a9e539f 100644 > --- a/bitbake/lib/bb/data.py > +++ b/bitbake/lib/bb/data.py > @@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps, d): > #bb.note("Variable %s references %s and calls %s" % (key, str(deps), str(execs))) > #d.setVarFlag(key, "vardeps", deps) > > -def generate_dependencies(d): > +def init_data_cache(d): > + bb.data.keylist = set(key for key in d.keys() if not key.startswith("__")) > + bb.data.shelldeps = set(key for key in bb.data.keylist if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport")) > > - keys = set(key for key in d.keys() if not key.startswith("__")) > - shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport")) > +def generate_dependencies(d): > > deps = {} > taskdeps = {} > > tasklist = bb.data.getVar('__BBTASKS', d) or [] > for task in tasklist: > - deps[task] = build_dependencies(task, keys, shelldeps, d) > + deps[task] = build_dependencies(task, bb.data.keylist, bb.data.shelldeps, d) > > newdeps = deps[task] > seen = set() > @@ -316,7 +317,7 @@ def generate_dependencies(d): > newdeps = set() > for dep in nextdeps: > if dep not in deps: > - deps[dep] = build_dependencies(dep, keys, shelldeps, d) > + deps[dep] = build_dependencies(dep, bb.data.keylist, bb.data.shelldeps, d) > newdeps |= deps[dep] > newdeps -= seen > taskdeps[task] = seen | newdeps I'm afraid this isn't going to be quite this simple although this does prove those lines of code are a big hotspot in parsing. Why? You're creating the key and export lists for the base configuration data whereas the original code creates these lists for the *total* parsed metadata. There will therefore be differences in the values held by the two caches :(. As an example, if you set: FOO = "bar" in a .bb file, 'FOO' will not appear in your keywords cache. Cheers, Richard ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] bitbake: optimize file parsing speed 2010-11-19 10:27 ` Richard Purdie @ 2010-11-19 13:14 ` Xu, Dongxiao 2010-11-26 8:07 ` Qing He 1 sibling, 0 replies; 5+ messages in thread From: Xu, Dongxiao @ 2010-11-19 13:14 UTC (permalink / raw) To: Richard Purdie; +Cc: poky@yoctoproject.org Richard Purdie wrote: > On Wed, 2010-11-17 at 12:10 +0800, Dongxiao Xu wrote: >> build some data cache for generate_dependencies() on hand, and later >> each time when parsing the bb file, we do not need to build them >> again >> and again. >> >> This optimization could get about 50% speed gain when parsing all >> ~800 >> bb files. >> >> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> >> --- >> bitbake/lib/bb/cooker.py | 2 ++ >> bitbake/lib/bb/data.py | 11 ++++++----- >> 2 files changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py >> index 33eb65e..05e6c16 100644 --- a/bitbake/lib/bb/cooker.py >> +++ b/bitbake/lib/bb/cooker.py >> @@ -76,6 +76,8 @@ class BBCooker: >> >> self.configuration.data = bb.data.init() >> >> + bb.data.init_data_cache(self.configuration.data) + >> if not server: >> bb.data.setVar("BB_WORKERCONTEXT", "1", >> self.configuration.data) >> >> diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py index >> fee10cc..a9e539f 100644 >> --- a/bitbake/lib/bb/data.py >> +++ b/bitbake/lib/bb/data.py >> @@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps, >> d): #bb.note("Variable %s references %s and calls %s" % (key, >> str(deps), str(execs))) #d.setVarFlag(key, "vardeps", deps) >> >> -def generate_dependencies(d): >> +def init_data_cache(d): >> + bb.data.keylist = set(key for key in d.keys() if not >> key.startswith("__")) + bb.data.shelldeps = set(key for key in >> bb.data.keylist if +d.getVarFlag(key, "export") and not >> d.getVarFlag(key, "unexport")) >> >> - keys = set(key for key in d.keys() if not key.startswith("__")) >> - shelldeps = set(key for key in keys if d.getVarFlag(key, >> "export") and not d.getVarFlag(key, "unexport")) +def >> generate_dependencies(d): >> >> deps = {} >> taskdeps = {} >> >> tasklist = bb.data.getVar('__BBTASKS', d) or [] for task >> in tasklist: - deps[task] = build_dependencies(task, keys, >> shelldeps, d) + deps[task] = build_dependencies(task, >> bb.data.keylist, + bb.data.shelldeps, d) >> >> newdeps = deps[task] >> seen = set() >> @@ -316,7 +317,7 @@ def generate_dependencies(d): >> newdeps = set() >> for dep in nextdeps: >> if dep not in deps: >> - deps[dep] = build_dependencies(dep, keys, >> shelldeps, d) + deps[dep] = >> build_dependencies(dep, + bb.data.keylist, bb.data.shelldeps, d) >> newdeps |= deps[dep] >> newdeps -= seen >> taskdeps[task] = seen | newdeps > > > I'm afraid this isn't going to be quite this simple although this > does prove those lines of code are a big hotspot in parsing. > > Why? You're creating the key and export lists for the base > configuration data whereas the original code creates these lists for > the *total* parsed metadata. There will therefore be differences in > the values held by the two caches :(. > > As an example, if you set: > > FOO = "bar" in a .bb file, 'FOO' will not appear in your keywords > cache. > > Cheers, > > Richard Richard, Yes, you are right, thanks for pointing it out. Now I am trying to solve this parse time issue in another way. We saw that the following two lines cost a lot of cycles. keys = set(key for key in d.keys() if not key.startswith("__")) shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport")) After dump out the d.keys(), I found most of the items (>90%) are variables in distro_tracking_fields.inc, actually they are not used in normal build process. I checked the code, some functions in utility-tasks.bbclass (related with upstream version check) will need information in distro_tracking_fields.inc, and so that poky.conf includes this file. And utility-tasks.bbclass is inherited in base.bbclass, which is somewhat fundamental to poky. I am thinking of moving those distro checking related code from utility-tasks.bbclass to distrodata.bbclass, in order not to involve such big database (distro_tracking_fields.inc) in normal parsing process. Does it make sense? I did some simple tests, and this could save about 20% file parsing time. For the repeated parsing hot spot, I will continue to investigate to see whether there is optimization point. Thanks, Dongxiao ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] bitbake: optimize file parsing speed 2010-11-19 10:27 ` Richard Purdie 2010-11-19 13:14 ` Xu, Dongxiao @ 2010-11-26 8:07 ` Qing He 1 sibling, 0 replies; 5+ messages in thread From: Qing He @ 2010-11-26 8:07 UTC (permalink / raw) To: Richard Purdie; +Cc: poky@yoctoproject.org On Fri, 2010-11-19 at 18:27 +0800, Richard Purdie wrote: > I'm afraid this isn't going to be quite this simple although this does > prove those lines of code are a big hotspot in parsing. > > Why? You're creating the key and export lists for the base configuration > data whereas the original code creates these lists for the *total* > parsed metadata. There will therefore be differences in the values held > by the two caches :(. Is it possible to maintain a list during variable changes? say, tag several method like setVar, setVarFlag, etc., add something like d["_conanicalkeys"] and d["_exportkeys"] and when this info is required, recursively get every "_conanicalkeys" from d and d["_data"]. It has like 13% speed improvement in my experiment, but is quite tricky and ugly, possibly not worth the trouble. Btw, looks like much of the performance downgrade of the parsing is due to moving python/shell code parsing/compilation to parse time? That way, the performance of parse is OK? Thanks, Qing ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-26 8:12 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-17 4:49 [PATCH 0/1] [PULL] bitbake: optimize the file parsing speed Dongxiao Xu 2010-11-17 4:10 ` [PATCH 1/1] bitbake: optimize " Dongxiao Xu 2010-11-19 10:27 ` Richard Purdie 2010-11-19 13:14 ` Xu, Dongxiao 2010-11-26 8:07 ` Qing He
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.