Quick hack for profiling tasks

All of lore.kernel.org
 help / color / mirror / Atom feed

* Quick hack for profiling tasks
@ 2011-02-01  0:28 Richard Purdie
  2011-02-01  0:40 ` Richard Purdie
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2011-02-01  0:28 UTC (permalink / raw)
  To: poky

Hi,

One thing that is bugging me whilst I've been debugging some issues
we're having with the libc/libgcc package dependency issue is how long
do_package takes for libc. The question is where does it spend the time?
Answer, I have no idea.

I hacked together the patch below to find out. Its ugly and uses the
boilerplate profiling code from cooker, cut and pasted here to profile
the actual tasks that run.

I've yet to look at the results but it should allow us to optimise the
python tasks a bit if we can see where they spend time. I'm hoping this
lets others look at that too and also it give us some hints as to how we
might improve the core when turning on profiling in bitbake.

Cheers,

Richard


diff --git a/bitbake/lib/bb/build.py b/bitbake/lib/bb/build.py
index 42d1726..02c0a08 100644
--- a/bitbake/lib/bb/build.py
+++ b/bitbake/lib/bb/build.py
@@ -365,7 +365,36 @@ def exec_task(fn, task, d):
         if d.getVarFlag(task, "quieterrors") is not None:
             quieterr = True
 
-        return _exec_task(fn, task, d, quieterr)
+        profbasename = os.path.basename(fn) + "-" + task
+        try:
+            import cProfile as profile
+        except:
+            import profile
+        prof = profile.Profile()
+
+        ret = profile.Profile.runcall(prof, _exec_task, fn, task, d, quieterr)
+
+        prof.dump_stats("profile-%s.log" % (profbasename))
+
+        # Redirect stdout to capture profile information
+        pout = open('profile-%s.log.processed' % (profbasename), 'w')
+        so = sys.stdout.fileno()
+        orig_so = os.dup(sys.stdout.fileno())
+        os.dup2(pout.fileno(), so)
+   
+        import pstats
+        p = pstats.Stats('profile-%s.log' % (profbasename))
+        p.sort_stats('time')
+        p.print_stats()
+        p.print_callers()
+        p.sort_stats('cumulative')
+        p.print_stats()
+
+        os.dup2(orig_so, so)
+        pout.flush()
+        pout.close()  
+
+        return ret
     except Exception:
         from traceback import format_exc
         if not quieterr:



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Quick hack for profiling tasks
  2011-02-01  0:28 Quick hack for profiling tasks Richard Purdie
@ 2011-02-01  0:40 ` Richard Purdie
  2011-02-01  1:43   ` Mark Hatle
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2011-02-01  0:40 UTC (permalink / raw)
  To: poky

On Tue, 2011-02-01 at 00:28 +0000, Richard Purdie wrote:
> One thing that is bugging me whilst I've been debugging some issues
> we're having with the libc/libgcc package dependency issue is how long
> do_package takes for libc. The question is where does it spend the time?
> Answer, I have no idea.
> 
> I hacked together the patch below to find out. Its ugly and uses the
> boilerplate profiling code from cooker, cut and pasted here to profile
> the actual tasks that run.
> 
> I've yet to look at the results but it should allow us to optimise the
> python tasks a bit if we can see where they spend time. I'm hoping this
> lets others look at that too and also it give us some hints as to how we
> might improve the core when turning on profiling in bitbake.

For eglibc this worked out as:

Tue Feb  1 00:33:21 2011    profile-eglibc_2.12.bb-do_package.log

         8339733 function calls (8001600 primitive calls) in 877.972 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3206  321.887    0.100  322.422    0.101 package_do_filedeps:12(process_deps)
      403  311.208    0.772  311.208    0.772 {posix.waitpid}
   134054   69.860    0.001   69.860    0.001 {method 'read' of 'file' objects}
   225554   23.367    0.000   23.367    0.000 {posix.stat}
      866   20.279    0.023   20.279    0.023 {posix.system}
    85562   19.406    0.000   19.406    0.000 {posix.chmod}
   168083   16.691    0.000   16.691    0.000 {posix.lstat}
    25824   14.399    0.001   14.399    0.001 {posix.rename}
    55391   13.731    0.000   13.731    0.000 {open}
     5325    9.019    0.002    9.019    0.002 {posix.popen}
     2279    5.490    0.002    5.490    0.002 {method 'readlines' of 'file' objects}
     6403    5.187    0.001    6.346    0.001 insane.bbclass:1(package_qa_hash_style)
    19214    5.046    0.000    5.046    0.000 {posix.mkdir}

so its spending a third of the time in package_do_filedeps(), a lot of
which is in waitpid waiting for the process that was spawned.

Mark: Is there a way we could batch up the information rather than go
file by file? I'm going to look at this for other areas to improve too
but thats obviously one worthy of attention.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Quick hack for profiling tasks
  2011-02-01  0:40 ` Richard Purdie
@ 2011-02-01  1:43   ` Mark Hatle
  2011-02-01 11:05     ` Richard Purdie
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Hatle @ 2011-02-01  1:43 UTC (permalink / raw)
  To: Richard Purdie; +Cc: poky

On 1/31/11 6:40 PM, Richard Purdie wrote:
> On Tue, 2011-02-01 at 00:28 +0000, Richard Purdie wrote:
>> One thing that is bugging me whilst I've been debugging some issues
>> we're having with the libc/libgcc package dependency issue is how long
>> do_package takes for libc. The question is where does it spend the time?
>> Answer, I have no idea.
>>
>> I hacked together the patch below to find out. Its ugly and uses the
>> boilerplate profiling code from cooker, cut and pasted here to profile
>> the actual tasks that run.
>>
>> I've yet to look at the results but it should allow us to optimise the
>> python tasks a bit if we can see where they spend time. I'm hoping this
>> lets others look at that too and also it give us some hints as to how we
>> might improve the core when turning on profiling in bitbake.
> 
> For eglibc this worked out as:
> 
> Tue Feb  1 00:33:21 2011    profile-eglibc_2.12.bb-do_package.log
> 
>          8339733 function calls (8001600 primitive calls) in 877.972 CPU seconds
> 
>    Ordered by: internal time
> 
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>      3206  321.887    0.100  322.422    0.101 package_do_filedeps:12(process_deps)
>       403  311.208    0.772  311.208    0.772 {posix.waitpid}
>    134054   69.860    0.001   69.860    0.001 {method 'read' of 'file' objects}
>    225554   23.367    0.000   23.367    0.000 {posix.stat}
>       866   20.279    0.023   20.279    0.023 {posix.system}
>     85562   19.406    0.000   19.406    0.000 {posix.chmod}
>    168083   16.691    0.000   16.691    0.000 {posix.lstat}
>     25824   14.399    0.001   14.399    0.001 {posix.rename}
>     55391   13.731    0.000   13.731    0.000 {open}
>      5325    9.019    0.002    9.019    0.002 {posix.popen}
>      2279    5.490    0.002    5.490    0.002 {method 'readlines' of 'file' objects}
>      6403    5.187    0.001    6.346    0.001 insane.bbclass:1(package_qa_hash_style)
>     19214    5.046    0.000    5.046    0.000 {posix.mkdir}
> 
> so its spending a third of the time in package_do_filedeps(), a lot of
> which is in waitpid waiting for the process that was spawned.
> 
> Mark: Is there a way we could batch up the information rather than go
> file by file? I'm going to look at this for other areas to improve too
> but thats obviously one worthy of attention.

The way the routine works today is via a script call perfile_rpmdeps.sh.  The
package.bbclass calls this script twice for each package-split.

I.e. if we have base, base-dbg and base-libs, it will run for a total of 6
times.  Each pair it is simply passed the path to the packages-split directory.

Within the script itself, it is doing a find operation:

        find "$@" | process $process_type

The output of the file is passed to "process" which is just a wrapper that calls
the rpmdeps program with the correct parameters.  We could optimize this a bit
by ignoring directories and symlinks.  But we still want to process all of the
files in the system.

Another optimization (that we do NOT have) that is done by default in RPM, is to
only process files that are +x.  We have chosen not to do this as most of our
libraries are not set +x.  An alternative is to use 'file' and check the type of
each file, however identifying the file type is likely to take longer then
simply running the per-file deps commands.  Another possible optimization is
only scan certain directories (or the opposite, skip certain directories...) the
only issue here is missing files that may be dlopened or loaded via RPATH
because they are in a non-standard location.

So what I'd recommend is we start by adding "-type f" to the find.  That is
likely to help some.

Maybe then add a check for either sitting in /lib or /usr/lib _or_ mode is +x?
We would likely need to audit the system somehow and tag ELF files that are
neither...

--Mark

> Cheers,
> 
> Richard
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Quick hack for profiling tasks
  2011-02-01  1:43   ` Mark Hatle
@ 2011-02-01 11:05     ` Richard Purdie
  2011-02-01 15:38       ` Mark Hatle
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2011-02-01 11:05 UTC (permalink / raw)
  To: Mark Hatle; +Cc: poky

On Mon, 2011-01-31 at 19:43 -0600, Mark Hatle wrote:
> On 1/31/11 6:40 PM, Richard Purdie wrote:
> > On Tue, 2011-02-01 at 00:28 +0000, Richard Purdie wrote:
> >> One thing that is bugging me whilst I've been debugging some issues
> >> we're having with the libc/libgcc package dependency issue is how long
> >> do_package takes for libc. The question is where does it spend the time?
> >> Answer, I have no idea.
> >>
> >> I hacked together the patch below to find out. Its ugly and uses the
> >> boilerplate profiling code from cooker, cut and pasted here to profile
> >> the actual tasks that run.
> >>
> >> I've yet to look at the results but it should allow us to optimise the
> >> python tasks a bit if we can see where they spend time. I'm hoping this
> >> lets others look at that too and also it give us some hints as to how we
> >> might improve the core when turning on profiling in bitbake.
> > 
> > For eglibc this worked out as:
> > 
> > Tue Feb  1 00:33:21 2011    profile-eglibc_2.12.bb-do_package.log
> > 
> >          8339733 function calls (8001600 primitive calls) in 877.972 CPU seconds
> > 
> >    Ordered by: internal time
> > 
> >    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> >      3206  321.887    0.100  322.422    0.101 package_do_filedeps:12(process_deps)
> >       403  311.208    0.772  311.208    0.772 {posix.waitpid}
> >    134054   69.860    0.001   69.860    0.001 {method 'read' of 'file' objects}
> >    225554   23.367    0.000   23.367    0.000 {posix.stat}
> >       866   20.279    0.023   20.279    0.023 {posix.system}
> >     85562   19.406    0.000   19.406    0.000 {posix.chmod}
> >    168083   16.691    0.000   16.691    0.000 {posix.lstat}
> >     25824   14.399    0.001   14.399    0.001 {posix.rename}
> >     55391   13.731    0.000   13.731    0.000 {open}
> >      5325    9.019    0.002    9.019    0.002 {posix.popen}
> >      2279    5.490    0.002    5.490    0.002 {method 'readlines' of 'file' objects}
> >      6403    5.187    0.001    6.346    0.001 insane.bbclass:1(package_qa_hash_style)
> >     19214    5.046    0.000    5.046    0.000 {posix.mkdir}
> > 
> > so its spending a third of the time in package_do_filedeps(), a lot of
> > which is in waitpid waiting for the process that was spawned.
> > 
> > Mark: Is there a way we could batch up the information rather than go
> > file by file? I'm going to look at this for other areas to improve too
> > but thats obviously one worthy of attention.
> 
> The way the routine works today is via a script call perfile_rpmdeps.sh.  The
> package.bbclass calls this script twice for each package-split.
> 
> I.e. if we have base, base-dbg and base-libs, it will run for a total of 6
> times.  Each pair it is simply passed the path to the packages-split directory.
> 
> Within the script itself, it is doing a find operation:
> 
>         find "$@" | process $process_type
> 
> The output of the file is passed to "process" which is just a wrapper that calls
> the rpmdeps program with the correct parameters.  We could optimize this a bit
> by ignoring directories and symlinks.  But we still want to process all of the
> files in the system.
> 
> Another optimization (that we do NOT have) that is done by default in RPM, is to
> only process files that are +x.  We have chosen not to do this as most of our
> libraries are not set +x.  An alternative is to use 'file' and check the type of
> each file, however identifying the file type is likely to take longer then
> simply running the per-file deps commands.  Another possible optimization is
> only scan certain directories (or the opposite, skip certain directories...) the
> only issue here is missing files that may be dlopened or loaded via RPATH
> because they are in a non-standard location.
> 
> So what I'd recommend is we start by adding "-type f" to the find.  That is
> likely to help some.
> 
> Maybe then add a check for either sitting in /lib or /usr/lib _or_ mode is +x?
> We would likely need to audit the system somehow and tag ELF files that are
> neither...

I did a little investigating and my brain dump on it follows...

I noticed that rpmdeps doesn't help us with any locale information in
the libc case so I tried this patch:

diff --git a/meta/classes/package.bbclass b/meta/classes/package.bbclass
index 856858c..476a84a 100644
--- a/meta/classes/package.bbclass
+++ b/meta/classes/package.bbclass
@@ -607,7 +607,7 @@ python package_do_filedeps() {
 
        # Determine dependencies
        for pkg in packages.split():
-               if pkg.endswith('-dbg'):
+               if pkg.endswith('-dbg') or pkg.find('-locale-') != -1 or pkg.find('-localedata-') != -1 or pkg.find('-gconv-') != -1  or pkg.find('-charmap-') != -1:
                        continue
 
                # Process provides

The result was:

Tue Feb  1 10:06:46 2011    profile-eglibc_2.12.bb-do_package.log

         8106962 function calls (7768817 primitive calls) in 594.886 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      403  308.929    0.767  308.929    0.767 {posix.waitpid}
   134054   72.907    0.001   72.907    0.001 {method 'read' of 'file' objects}
      828   34.643    0.042   34.795    0.042 package_do_filedeps:12(process_deps)
      866   29.347    0.034   29.347    0.034 {posix.system}
   225558   23.459    0.000   23.459    0.000 {posix.stat}
    85562   18.980    0.000   18.980    0.000 {posix.chmod}
   171317   17.198    0.000   17.198    0.000 {posix.lstat}
    55393   14.378    0.000   14.378    0.000 {open}
    25824   13.719    0.001   13.719    0.001 {posix.rename}
     2280    5.383    0.002    5.383    0.002 {method 'readlines' of 'file' objects}
     6403    4.877    0.001    5.762    0.001 insane.bbclass:1(package_qa_hash_style)
    19214    4.843    0.000    4.843    0.000 {posix.mkdir}
     2947    3.867    0.001    3.867    0.001 {posix.popen}

So overall we dropped from 878 seconds to 595 seconds and
package_do_filedeps:12(process_deps) dropped from 322 to 34 seconds. 

FWIW, the waitpid above is from the binary locale generation, not from
filedeps and we know that is "slow" in the libc case although the cross
locale generation speeds it up a lot. The reason process_deps is showing
so badly on the profile is not because that python function is slow (as
it usually the case) but because its reading from a pipe which blocks on
the output of the rpmdeps process.

I did take a look at the perfile_rpmdeps.sh script and rpmdeps itself. I
suspect we might be faster if we called "rpmdeps --requires --provides
-v" on each file ourselves and handle the prefix the -v option adds to
differentiate between the requires and provides. There is a sed
operation in the script I didn't 100% understand that we'd need to
account for too. Watching cpu usage whilst manually running the script
was not encouraging :/.

So these findings along with your thoughts should give us some avenues
to explore. Do you thing skipping locale packages is ok to do? Maybe
kernel modules too?

There are also other things to pick off in that profile list such as the
200,000 stat calls (from sstate?), the insane check thats showing up and
what all the read calls are doing.

Cheers,

Richard




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Quick hack for profiling tasks
  2011-02-01 11:05     ` Richard Purdie
@ 2011-02-01 15:38       ` Mark Hatle
  2011-02-02 14:28         ` Richard Purdie
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Hatle @ 2011-02-01 15:38 UTC (permalink / raw)
  To: Richard Purdie; +Cc: poky

On 2/1/11 5:05 AM, Richard Purdie wrote:
> On Mon, 2011-01-31 at 19:43 -0600, Mark Hatle wrote:
>> On 1/31/11 6:40 PM, Richard Purdie wrote:
>>> On Tue, 2011-02-01 at 00:28 +0000, Richard Purdie wrote:

...

>>> Mark: Is there a way we could batch up the information rather than go
>>> file by file? I'm going to look at this for other areas to improve too
>>> but thats obviously one worthy of attention.
>>
>> The way the routine works today is via a script call perfile_rpmdeps.sh.  The
>> package.bbclass calls this script twice for each package-split.
>>
>> I.e. if we have base, base-dbg and base-libs, it will run for a total of 6
>> times.  Each pair it is simply passed the path to the packages-split directory.
>>
>> Within the script itself, it is doing a find operation:
>>
>>         find "$@" | process $process_type
>>
>> The output of the file is passed to "process" which is just a wrapper that calls
>> the rpmdeps program with the correct parameters.  We could optimize this a bit
>> by ignoring directories and symlinks.  But we still want to process all of the
>> files in the system.
>>
>> Another optimization (that we do NOT have) that is done by default in RPM, is to
>> only process files that are +x.  We have chosen not to do this as most of our
>> libraries are not set +x.  An alternative is to use 'file' and check the type of
>> each file, however identifying the file type is likely to take longer then
>> simply running the per-file deps commands.  Another possible optimization is
>> only scan certain directories (or the opposite, skip certain directories...) the
>> only issue here is missing files that may be dlopened or loaded via RPATH
>> because they are in a non-standard location.
>>
>> So what I'd recommend is we start by adding "-type f" to the find.  That is
>> likely to help some.
>>
>> Maybe then add a check for either sitting in /lib or /usr/lib _or_ mode is +x?
>> We would likely need to audit the system somehow and tag ELF files that are
>> neither...
> 
> I did a little investigating and my brain dump on it follows...
> 
> I noticed that rpmdeps doesn't help us with any locale information in
> the libc case so I tried this patch:

I'm still used to the RPM world where locales are included in the main package
and "tagged" so they will or won't be included automatically.  I forget they are
separated out in Poky by default.

So skipping them makes sense to me...

> diff --git a/meta/classes/package.bbclass b/meta/classes/package.bbclass
> index 856858c..476a84a 100644
> --- a/meta/classes/package.bbclass
> +++ b/meta/classes/package.bbclass
> @@ -607,7 +607,7 @@ python package_do_filedeps() {
>  
>         # Determine dependencies
>         for pkg in packages.split():
> -               if pkg.endswith('-dbg'):
> +               if pkg.endswith('-dbg') or pkg.find('-locale-') != -1 or pkg.find('-localedata-') != -1 or pkg.find('-gconv-') != -1  or pkg.find('-charmap-') != -1:
>                         continue
>  
>                 # Process provides
> 
> The result was:
> 
> Tue Feb  1 10:06:46 2011    profile-eglibc_2.12.bb-do_package.log
> 
>          8106962 function calls (7768817 primitive calls) in 594.886 CPU seconds
> 
>    Ordered by: internal time
> 
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>       403  308.929    0.767  308.929    0.767 {posix.waitpid}
>    134054   72.907    0.001   72.907    0.001 {method 'read' of 'file' objects}
>       828   34.643    0.042   34.795    0.042 package_do_filedeps:12(process_deps)
>       866   29.347    0.034   29.347    0.034 {posix.system}
>    225558   23.459    0.000   23.459    0.000 {posix.stat}
>     85562   18.980    0.000   18.980    0.000 {posix.chmod}
>    171317   17.198    0.000   17.198    0.000 {posix.lstat}
>     55393   14.378    0.000   14.378    0.000 {open}
>     25824   13.719    0.001   13.719    0.001 {posix.rename}
>      2280    5.383    0.002    5.383    0.002 {method 'readlines' of 'file' objects}
>      6403    4.877    0.001    5.762    0.001 insane.bbclass:1(package_qa_hash_style)
>     19214    4.843    0.000    4.843    0.000 {posix.mkdir}
>      2947    3.867    0.001    3.867    0.001 {posix.popen}
> 
> So overall we dropped from 878 seconds to 595 seconds and
> package_do_filedeps:12(process_deps) dropped from 322 to 34 seconds. 

This makes sense.  The locale files don't have any dependencies in them, as far
as I am aware...  and it is a ton of files to process within glibc...

> FWIW, the waitpid above is from the binary locale generation, not from
> filedeps and we know that is "slow" in the libc case although the cross
> locale generation speeds it up a lot. The reason process_deps is showing
> so badly on the profile is not because that python function is slow (as
> it usually the case) but because its reading from a pipe which blocks on
> the output of the rpmdeps process.

Is there a faster mechanism we could use -- or is this actually efficient, but
not showing it in the trace?

> I did take a look at the perfile_rpmdeps.sh script and rpmdeps itself. I
> suspect we might be faster if we called "rpmdeps --requires --provides
> -v" on each file ourselves and handle the prefix the -v option adds to
> differentiate between the requires and provides. There is a sed
> operation in the script I didn't 100% understand that we'd need to
> account for too. Watching cpu usage whilst manually running the script
> was not encouraging :/.

The sed actually accomplished two things.  The first is removes instances of the
dependency "rpmlib(...)".  This dependency is added by RPM to track specific rpm
versioning requirements.  Things such as symlink validation, compression
methods, etc.  None of these are needed within the Poky environment.

The second thing is does is translates the dependency equations into a Poky
style equation.

In rpm versioned dependencies are normally like:

foo = 1.2-3

However in Poky the format is different:

foo (= 1.2-3)

So the sed finds these, and simply moves them into parens.

We may be able to move the sed out of the direct loop and make it a single call
over the full run.  I think this will work, and it's likely more efficient as well.

As for calling rpmdeps with the -v, we can switch that mode if it makes it
quicker to process the results, it just becomes slightly more complicated to
process.  With the method I'm using now, all of the dependencies return on a
single line prefixed by the filename.  If we switch to "-v", then we would
likely need to do "filename:\ndep1\ndep2\n" etc..  I'm sure we could process
that in python, but the difficulty wasn't worth it at the time.

> So these findings along with your thoughts should give us some avenues
> to explore. Do you thing skipping locale packages is ok to do? Maybe
> kernel modules too?

locales should be fine.  Skipping kernel modules should be fine as well.  I
don't believe there is any data that is learned from the .ko files.

--Mark

> There are also other things to pick off in that profile list such as the
> 200,000 stat calls (from sstate?), the insane check thats showing up and
> what all the read calls are doing.
> 
> Cheers,
> 
> Richard
> 
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Quick hack for profiling tasks
  2011-02-01 15:38       ` Mark Hatle
@ 2011-02-02 14:28         ` Richard Purdie
  2011-02-03 11:22           ` Richard Purdie
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2011-02-02 14:28 UTC (permalink / raw)
  To: Mark Hatle; +Cc: poky

On Tue, 2011-02-01 at 09:38 -0600, Mark Hatle wrote:
> I'm still used to the RPM world where locales are included in the main package
> and "tagged" so they will or won't be included automatically.  I forget they are
> separated out in Poky by default.
> 
> So skipping them makes sense to me...

Some further numbers to backup my comments:

"time bitbake poky-image-sato -k"

Before patch:

real 129m40.482s
user 480m16.520s
sys 66m9.170s

After patch:

real 117m7.704s
user 455m40.330s
sys 55m1.390s

The patch in question:

diff --git a/meta/classes/package.bbclass b/meta/classes/package.bbclass
index 856858c..fbc1463 100644
--- a/meta/classes/package.bbclass
+++ b/meta/classes/package.bbclass
@@ -607,7 +607,7 @@ python package_do_filedeps() {
 
 	# Determine dependencies
 	for pkg in packages.split():
-		if pkg.endswith('-dbg'):
+		if pkg.endswith('-dbg') or pkg.find('-locale-') != -1 or pkg.find('-localedata-') != -1 or pkg.find('-gconv-') != -1  or pkg.find('-charmap-') != -1 or pkg.startswith('kernel-module-') != -1:
 			continue
 
 		# Process provides


So I think this is a sufficient improvement for us to include it unless
there are any objections from anyone.

> > FWIW, the waitpid above is from the binary locale generation, not from
> > filedeps and we know that is "slow" in the libc case although the cross
> > locale generation speeds it up a lot. The reason process_deps is showing
> > so badly on the profile is not because that python function is slow (as
> > it usually the case) but because its reading from a pipe which blocks on
> > the output of the rpmdeps process.
> 
> Is there a faster mechanism we could use -- or is this actually efficient, but
> not showing it in the trace?

I was thinking this is ok efficiency wise, it just means the function
will look to take a long time on the graphs. The result above surprises
me though as we shrunk the kernel (sys) time by over 10 minutes. This
could mean we're trashing a lot with context switching using this
approach. There is no reason we can't log to a file and then read the
file for processing rather than context switch.

> > I did take a look at the perfile_rpmdeps.sh script and rpmdeps itself. I
> > suspect we might be faster if we called "rpmdeps --requires --provides
> > -v" on each file ourselves and handle the prefix the -v option adds to
> > differentiate between the requires and provides. There is a sed
> > operation in the script I didn't 100% understand that we'd need to
> > account for too. Watching cpu usage whilst manually running the script
> > was not encouraging :/.
> 
> The sed actually accomplished two things.  The first is removes instances of the
> dependency "rpmlib(...)".  This dependency is added by RPM to track specific rpm
> versioning requirements.  Things such as symlink validation, compression
> methods, etc.  None of these are needed within the Poky environment.

Ok, this makes sense. We could always just skip these in the python
processing code.

> The second thing is does is translates the dependency equations into a Poky
> style equation.
> 
> In rpm versioned dependencies are normally like:
> 
> foo = 1.2-3
> 
> However in Poky the format is different:
> 
> foo (= 1.2-3)
> 
> So the sed finds these, and simply moves them into parens.
> 
> We may be able to move the sed out of the direct loop and make it a single call
> over the full run.  I think this will work, and it's likely more efficient as well.

Agreed, either we make it a single call, or we can easily adapt the
python to handle this conversion too.

> As for calling rpmdeps with the -v, we can switch that mode if it makes it
> quicker to process the results, it just becomes slightly more complicated to
> process.

My thoughts here are that we could well half the execution time by doing
everything in one exec call, not two. The additional overhead for
handling the prefix in python is negligible.

>   With the method I'm using now, all of the dependencies return on a
> single line prefixed by the filename.  If we switch to "-v", then we would
> likely need to do "filename:\ndep1\ndep2\n" etc..  I'm sure we could process
> that in python, but the difficulty wasn't worth it at the time.

Fair enough but I think there is something odd going on with that
function we need to get to the bottom of as it does seem to be eating a
large amount of time, possibly when it needn't.

Cheers,

Richard



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Quick hack for profiling tasks
  2011-02-02 14:28         ` Richard Purdie
@ 2011-02-03 11:22           ` Richard Purdie
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Purdie @ 2011-02-03 11:22 UTC (permalink / raw)
  To: Mark Hatle; +Cc: poky

On Wed, 2011-02-02 at 14:28 +0000, Richard Purdie wrote:
> On Tue, 2011-02-01 at 09:38 -0600, Mark Hatle wrote:
> > I'm still used to the RPM world where locales are included in the main package
> > and "tagged" so they will or won't be included automatically.  I forget they are
> > separated out in Poky by default.
> > 
> > So skipping them makes sense to me...
> 
> Some further numbers to backup my comments:
> 
> "time bitbake poky-image-sato -k"
> 
> Before patch:
> 
> real 129m40.482s
> user 480m16.520s
> sys 66m9.170s
> 
> After patch:
> 
> real 117m7.704s
> user 455m40.330s
> sys 55m1.390s

and just for reference, with the recent fetcher improvements:

real	113m39.607s
user	458m25.570s
sys	55m15.250s

I'm doubting the accuracy of the user figure there but the build should
be faster overall due to less taring/untaring of tarballs and it looks
like it is. It would be possible to really show an improvement by
specifically profiling the kernel/linux-libc-headers recipes.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-03 11:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-01  0:28 Quick hack for profiling tasks Richard Purdie
2011-02-01  0:40 ` Richard Purdie
2011-02-01  1:43   ` Mark Hatle
2011-02-01 11:05     ` Richard Purdie
2011-02-01 15:38       ` Mark Hatle
2011-02-02 14:28         ` Richard Purdie
2011-02-03 11:22           ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.