From: Martin Jansa <martin.jansa@gmail.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: OE-core <openembedded-core@lists.openembedded.org>,
Peter Kjellerstedt <peter.kjellerstedt@axis.com>
Subject: Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
Date: Fri, 16 Aug 2019 12:24:41 +0200 [thread overview]
Message-ID: <20190816102441.GB1515@jama> (raw)
In-Reply-To: <20190815150548.GA1515@jama>
[-- Attachment #1: Type: text/plain, Size: 7801 bytes --]
On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > > Comparing that build to a corresponding do-nothing build with Thud,
> > > the time difference matches those three minutes where I have no idea
> > > what bitbake is doing now that it didn’t need to do before…
> > >
> > > Hopefully these time degradations can be solved, because the current
> > > state of bitbake is barely usable. We also need to look into possible
> > > ways to improve the cooker output when it is running the setscene
> > > tasks so it makes some kind of sense again.
> >
> > We talked on irc and you pointed at the commit things started to go
> > wrong. Just to summarise things for the benefit of the list, this is
> > some quick testing I did:
> >
> > "bitbake -p; time bitbake core-image-minimal -n"
> >
> > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > 76.9s master-next
>
> I know I'm late to the party, but it took really long for the test to finish..
>
> I'm not using poky, so reproduce this testing on our builds I've used
> bitbake revisions matching with poky revision RP was testing:
>
> + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because current master isn't compatible with old bitbake (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], expression was ${@oe_import(d)} which triggered exception AttributeError: module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to possibly eliminate impact of metadata changes)
>
> tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM
>
> nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server configured
>
> on a build with few more layers:
> Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 1927 skipped, 54 masked, 0 errors.
>
> but first doing just core-image-minimal like RP did:
>
> time poky bitbake oe-core
> 2m8.191s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> N/A a0d941c787cf3ef030d190903279d311bc05d752 doesn't exist in poky/poky-contrib 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m17.053s 7df31ff36892c2f9c65326b06b4c70 1f630fdf0260db08541d3ca9f25f852931c19905 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e f43778c2d19e70d4befd483b06aaf247fc65c799 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c 8c26b451f22193ef1c544e2017cc84515566c1b8
> 2m38.185s master-next fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> cache.py:446: ResourceWarning: unclosed <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 41117)>
> value = pickled.load()
> started showing with this revision
> 2m17.991s master-next + fixups f7cd14f90463957b3e4be6d3876def98b78f1424
> 2m9.651s master-next + "Holding off tasks %s" out
>
> now world
> 253m36.637s 7df31ff36892c2f9c65326b06b4c70 1f630fdf0260db08541d3ca9f25f852931c19905 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e f43778c2d19e70d4befd483b06aaf247fc65c799 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> this is time when killed at "NOTE: Executing Tasks" without showing any progress on the tasks (unlike other tests) and only one bitbake process is running
> 633m27.475s master-next fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> this is time when killed at (1417 from 71749) - running very slowly unlike other bitbake revisions where the task number changes relatively quickly - about 10 tasks per second
> 89m13.992s master-next + "Holding off tasks %s" out
> 89m59.201s master-next updated today 85fe627fdb6510f0942917964386fad9d8c479c8
>
> Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
> slower than recent master-next.
>
> I'll send -P output next.
So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because
this commit introduced that
logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))
Here is -P output from it:
5min (til "Executing Tasks" message):
profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.return.at.Executing.Tasks
http://paste.ubuntu.com/p/Nw6n5hTPmF/
5hours (whole dry run):
profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.whole.dry.run
http://paste.ubuntu.com/p/sydFGvYTxQ/
profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.worker.log.processed
http://paste.ubuntu.com/p/RMGbm9DyZz/
So for good old baseline I need to go few commits back, to e.g. 18d4a31f RP was testing before:
1f630fdf runqueue: Enable dynamic task adjustment to hash equivalency
a0d941c7 siggen: Convert to use self.unitaskhashes
1f326f2c siggen: Add new unitaskhashes data variable which is cached
b707d0cb runqueue: Improve scenequeue processing logic
19a6e356 cache: Add SimpleCache class
7bb79099 siggen: Import unihash code from OE-Core
7aec8632 hashserv: SQL Optimizations
18d4a31f fetch2/wget: avoid 'maximum recursion depth' RuntimeErrors when handling 403 codes
94m19.081s
profile.log.processed.18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.whole.dry.run
http://paste.ubuntu.com/p/Jnvv8GJNKB/
And here is profile with latest master from yesterday:
90mins:
profile.log.processed.a5e084a266f63c2fd370122327615e49beaeb94e.whole.dry.run
http://paste.ubuntu.com/p/YhMCHPmg56/
So good news is that current master is a bit faster than "good old" 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.
But still significantly slower on my fast builder compared to world tests on poky + meta-oe,
not sure if it's all just because the size of metadata in my builds (with 71K tasks).
Even the dry running the tasks was about 10 tasks/second while RP was seeing around 200 tasks/second.
Will try to bump BB_NUMBER_THREADS from 8 to 72.
I've already disabled rm_work and PRserv, but that doesn't seem to have any significant impact on performance here. Other than resolving the PRserv terminating and then warnings about lost connection.
Cheers,
>
> >
> > So basically the original changes showed a 25% hit but the performance
> > of -next is dire.
> >
> > This is with no hash equiv server configured.
> >
> > It will vary depending on the target used (numbers with -sato for the
> > above would be interesting for comparision) and how much was or is in
> > sstate, they type of sstate mirror configured and so on.
> >
> > I really need to focus on getting the new code functioning correctly
> > before we attempt to optimise but if nobody tests the new code due to
> > performance problems we have a different issue. We also have a scaling
> > problem with the hash server itself I need to fix to stop the
> > autobuilder throwing weird errors. I'm therefore a bit challenged on
> > where to start with it all :/.
> >
> > Cheers,
> >
> > Richard
> >
> >
> >
> > --
> > _______________________________________________
> > Openembedded-core mailing list
> > Openembedded-core@lists.openembedded.org
> > http://lists.openembedded.org/mailman/listinfo/openembedded-core
>
> --
> Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com
--
Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 201 bytes --]
next prev parent reply other threads:[~2019-08-16 10:24 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-12 20:26 Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively) Peter Kjellerstedt
2019-08-13 9:04 ` Richard Purdie
2019-08-13 13:20 ` Alexander Kanavin
2019-08-13 19:18 ` Richard Purdie
2019-08-14 4:06 ` Khem Raj
2019-08-14 11:25 ` Alexander Kanavin
2019-08-14 11:36 ` richard.purdie
2019-08-14 12:08 ` Alexander Kanavin
2019-08-14 12:43 ` Alexander Kanavin
2019-08-14 12:50 ` Mikko.Rapeli
2019-08-14 12:55 ` richard.purdie
2019-08-14 14:57 ` Peter Kjellerstedt
2019-08-14 15:19 ` Khem Raj
2019-08-14 20:27 ` Alexander Kanavin
2019-08-14 21:25 ` richard.purdie
2019-08-15 12:56 ` Alexander Kanavin
2019-08-15 13:56 ` richard.purdie
2019-08-15 22:44 ` Richard Purdie
2019-08-15 15:05 ` Martin Jansa
2019-08-16 10:24 ` Martin Jansa [this message]
2019-08-16 15:00 ` Martin Jansa
2019-08-16 15:54 ` richard.purdie
2019-08-16 17:22 ` Martin Jansa
2019-08-19 21:23 ` Martin Jansa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190816102441.GB1515@jama \
--to=martin.jansa@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=peter.kjellerstedt@axis.com \
--cc=richard.purdie@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.