Illustration of task latency

Openembedded Core Discussions
 help / color / mirror / Atom feed

* Illustration of task latency
@ 2011-04-08 15:41 Richard Purdie
  2011-04-08 17:29 ` [poky] " Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Purdie @ 2011-04-08 15:41 UTC (permalink / raw)
  To: openembedded-core, poky

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

We've done a lot of good work in optimising task execution and latency
however I was looking at some of our build data and I think we still
have work to do. The attached picture is a "bootchart" style picture
showing the task execution at the start of a build. Each vertical
division represents one second. As you can see, the fetch/patch tasks
themselves run very quickly. Those markers are from the task entry/exit.

The worrying part is the empty space between the tasks which looks to be
about 1.5 seconds in size *per task*. This is a scary amount of latency
to be losing for task execution when you consider we have 4500 of these
for our reference build.

More investigation is needed to figure out where this time is being
spent but I'm guessing a significant fraction is in parsing the recipe
of the task that is about to be executed. We need to confirm that but if
that is the case but one possible optimisation is we should ensure we're
not parsing every BBCLASSEXTEND variant of the recipe, just the one we
want to run. I'm sure there are other ways we might be able to improve
this too.

Cheers,

Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [poky] Illustration of task latency
  2011-04-08 15:41 Illustration of task latency Richard Purdie
@ 2011-04-08 17:29 ` Richard Purdie
  2011-04-09 16:26   ` Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Purdie @ 2011-04-08 17:29 UTC (permalink / raw)
  To: openembedded-core; +Cc: poky

On Fri, 2011-04-08 at 08:41 -0700, Richard Purdie wrote:
> We've done a lot of good work in optimising task execution and latency
> however I was looking at some of our build data and I think we still
> have work to do. The attached picture is a "bootchart" style picture
> showing the task execution at the start of a build. Each vertical
> division represents one second. As you can see, the fetch/patch tasks
> themselves run very quickly. Those markers are from the task entry/exit.
> 
> The worrying part is the empty space between the tasks which looks to be
> about 1.5 seconds in size *per task*. This is a scary amount of latency
> to be losing for task execution when you consider we have 4500 of these
> for our reference build.
> 
> More investigation is needed to figure out where this time is being
> spent but I'm guessing a significant fraction is in parsing the recipe
> of the task that is about to be executed. We need to confirm that but if
> that is the case but one possible optimisation is we should ensure we're
> not parsing every BBCLASSEXTEND variant of the recipe, just the one we
> want to run. I'm sure there are other ways we might be able to improve
> this too.

To follow up on this, I added instrumentation to runqueue.py
fork_off_task(). There are two things of note. On my laptop which is a
much slower system than the one I showed in the benchmark (a quad core),
the task overhead isn't 1+ seconds as shown in the graph but 0.4 seconds
per task. I think the reason is there is a compile job running in
parallel on the other graph which is starving new tasks of CPU.

The second thing of note is 95% of the 0.4 seconds are in the
loadDataFull() call as suspected. I did notice that the logging changes
around that point in the code can distort timings but the time is really
being spent there.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [poky] Illustration of task latency
  2011-04-08 17:29 ` [poky] " Richard Purdie
@ 2011-04-09 16:26   ` Richard Purdie
  2011-04-10 18:04     ` Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Purdie @ 2011-04-09 16:26 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer; +Cc: poky

On Fri, 2011-04-08 at 10:29 -0700, Richard Purdie wrote:
> To follow up on this, I added instrumentation to runqueue.py
> fork_off_task(). There are two things of note. On my laptop which is a
> much slower system than the one I showed in the benchmark (a quad core),
> the task overhead isn't 1+ seconds as shown in the graph but 0.4 seconds
> per task. I think the reason is there is a compile job running in
> parallel on the other graph which is starving new tasks of CPU.
> 
> The second thing of note is 95% of the 0.4 seconds are in the
> loadDataFull() call as suspected. I did notice that the logging changes
> around that point in the code can distort timings but the time is really
> being spent there.

I've dug into this a bit. When running a given task, it is parsing all
bbclassextend variants again. I have a simple patch which fixes that
which I'll cleanup and share.

What I also noticed which was more odd was that the first finalise()
call was taking 0.2s, subsequent ones were taking 0.1s. It turns out
that the parsercache which is used by the siggen code isn't functioning
the way it should with a lot of cache misses. I think this is related to
the parallel parsing and only saving out cached data from the core, not
the individual recipes (i.e. the cache from the parser subthreads). I
suspect we could get some performance improvement by fixing this.

Cheers,

Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [poky] Illustration of task latency
  2011-04-09 16:26   ` Richard Purdie
@ 2011-04-10 18:04     ` Richard Purdie
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Purdie @ 2011-04-10 18:04 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer; +Cc: poky

On Sat, 2011-04-09 at 09:26 -0700, Richard Purdie wrote:
> I've dug into this a bit. When running a given task, it is parsing all
> bbclassextend variants again. I have a simple patch which fixes that
> which I'll cleanup and share.
> 
> What I also noticed which was more odd was that the first finalise()
> call was taking 0.2s, subsequent ones were taking 0.1s. It turns out
> that the parsercache which is used by the siggen code isn't functioning
> the way it should with a lot of cache misses. I think this is related to
> the parallel parsing and only saving out cached data from the core, not
> the individual recipes (i.e. the cache from the parser subthreads). I
> suspect we could get some performance improvement by fixing this.

Fixes for the issues I noticed are in:

http://git.pokylinux.org/cgit/cgit.cgi/poky-contrib/log/?h=rpurdie/bboptimise

It turns out executing a function at process exit time with
multiprocessing is quite tricky and I'm not 100% sure I'm using public
API with the approach I've ended up using.

The speedups have a good feel to them and show the right things with
benchmarks. It will be interesting to see how much affect they have on
real world build time (which I'll test when I get home next week).
They're also showing a significant reduction in parsing time.

I also noticed that skipped recipes were always being reparsed which is
suboptimial and I've included a work in progress fix for that in the
above branch too.

Cheers,

Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-04-10 18:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-08 15:41 Illustration of task latency Richard Purdie
2011-04-08 17:29 ` [poky] " Richard Purdie
2011-04-09 16:26   ` Richard Purdie
2011-04-10 18:04     ` Richard Purdie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox