All of lore.kernel.org
 help / color / mirror / Atom feed
From: Richard Purdie <richard.purdie@linuxfoundation.org>
To: Jianxun Zhang <jianxun.zhang@linux.intel.com>,
	bitbake-devel@lists.openembedded.org
Subject: Re: [PATCH] use multiple processes to dump signatures.
Date: Thu, 22 Dec 2016 08:59:18 +0000	[thread overview]
Message-ID: <1482397158.9843.130.camel@linuxfoundation.org> (raw)
In-Reply-To: <1482352057-26139-1-git-send-email-jianxun.zhang@linux.intel.com>

On Wed, 2016-12-21 at 12:27 -0800, Jianxun Zhang wrote:
> This change significantly shortens the time on reparsing stage
> of '-S' option.
> 
> Each file is reparsed and then dumped within a dedicated
> process. The maximum number of the running processes is not
> greater than the value of BB_NUMBER_PARSE_THREADS if it is set.
> 
> The dump_sigs() in class SignatureGeneratorBasic is _replaced_
> by a new dump_sigfn() interface, so calls from the outside and
> subclasses are dispatched to the implementation in the base
> class of SignatureGeneratorBasic.
> 
> Fixes [YOCTO #10352]

Thanks, I think this is heading in the right direction. 

I am a little bit worried that this leaves OE's sstatesig.py with a
dump_sigs() function which isn't used/connected into everything else
though? Does this still write out a locked sigs file after this change?

Cheers,

Richard


> Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
> ---
>  bitbake/lib/bb/runqueue.py | 32 +++++++++++++++++++++++++++-----
>  bitbake/lib/bb/siggen.py   |  4 ++--
>  2 files changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index 2ad8aad..c7d8d53 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -36,6 +36,7 @@ from bb import msg, data, event
>  from bb import monitordisk
>  import subprocess
>  import pickle
> +from multiprocessing import Process
>  
>  bblogger = logging.getLogger("BitBake")
>  logger = logging.getLogger("BitBake.RunQueue")
> @@ -1302,15 +1303,36 @@ class RunQueue:
>          else:
>              self.rqexe.finish()
>  
> +    def rq_dump_sigfn(self, fn, options):
> +        bb_cache = bb.cache.NoCache(self.cooker.databuilder)
> +        the_data = bb_cache.loadDataFull(fn,
> self.cooker.collection.get_file_appends(fn))
> +        siggen = bb.parse.siggen
> +        dataCaches = self.rqdata.dataCaches
> +        siggen.dump_sigfn(fn, dataCaches, options)
> +
>      def dump_signatures(self, options):
> -        done = set()
> +        fns = set()
>          bb.note("Reparsing files to collect dependency data")
> -        bb_cache = bb.cache.NoCache(self.cooker.databuilder)
> +
>          for tid in self.rqdata.runtaskentries:
>              fn = fn_from_tid(tid)
> -            if fn not in done:
> -                the_data = bb_cache.loadDataFull(fn,
> self.cooker.collection.get_file_appends(fn))
> -                done.add(fn)
> +            fns.add(fn)
> +
> +        max_process =
> int(self.cfgData.getVar("BB_NUMBER_PARSE_THREADS") or os.cpu_count()
> or 1)
> +        # We cannot use the real multiprocessing.Pool easily due to
> some local data
> +        # that can't be pickled. This is a cheap multi-process
> solution.
> +        launched = []
> +        while fns:
> +            if len(launched) < max_process:
> +                p = Process(target=self.rq_dump_sigfn,
> args=(fns.pop(), options))
> +                p.start()
> +                launched.append(p)
> +            for q in launched:
> +                # The finished processes are joined when calling
> is_alive()
> +                if not q.is_alive():
> +                    launched.remove(q)
> +        for p in launched:
> +                p.join()
>  
>          bb.parse.siggen.dump_sigs(self.rqdata.dataCaches, options)
>  
> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> index b20b9cf..ae50a18 100644
> --- a/bitbake/lib/bb/siggen.py
> +++ b/bitbake/lib/bb/siggen.py
> @@ -307,8 +307,8 @@ class
> SignatureGeneratorBasic(SignatureGenerator):
>                  pass
>              raise err
>  
> -    def dump_sigs(self, dataCaches, options):
> -        for fn in self.taskdeps:
> +    def dump_sigfn(self, fn, dataCaches, options):
> +        if fn in self.taskdeps:
>              for task in self.taskdeps[fn]:
>                  tid = fn + ":" + task
>                  (mc, _, _) = bb.runqueue.split_tid(tid)
> -- 
> 2.7.4
> 


  reply	other threads:[~2016-12-22  8:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-21 20:27 [PATCH] use multiple processes to dump signatures Jianxun Zhang
2016-12-22  8:59 ` Richard Purdie [this message]
2016-12-22 18:39   ` Jianxun Zhang
2017-01-10 22:54     ` Jianxun Zhang
2017-01-12 17:41       ` Richard Purdie
2017-01-12 19:33         ` Jianxun Zhang
2017-01-12 21:16           ` Richard Purdie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1482397158.9843.130.camel@linuxfoundation.org \
    --to=richard.purdie@linuxfoundation.org \
    --cc=bitbake-devel@lists.openembedded.org \
    --cc=jianxun.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.