* Checksums in Bitbake
@ 2010-03-24 13:27 Richard Purdie
2010-03-24 15:15 ` Frans Meulenbroeks
0 siblings, 1 reply; 10+ messages in thread
From: Richard Purdie @ 2010-03-24 13:27 UTC (permalink / raw)
To: bitbake-dev; +Cc: openembedded-devel
I've written down some of my brainstorming on checksums in bitbake
below. Thoughts are welcome...
For a variety of reasons I dislike our current stamp files and believe
they limit what we can do in the future with bitbake. We only use them
on a per recipe basis and actively block interaction between recipes.
Trying to use them in any kind of staging packages turns into a world of
pain. They're also not portable between different systems when you start
to consider timezones. Having looked at what other build systems do,
particularly e2factory, I love the idea of checksums and think these are
the future.
The idea is simple. A given set of metadata is condensed down into a
checksum. If the metadata changes, the checksum changes. If the checksum
matches, the input into a task matches and hence the output should be
the same.
The implications are far reaching. If you change the metadata, it
automatically rebuilds what you changed. If you change it back to
exactly what it was, the original staging package would become valid
again and be reused. PR bumps for recipe changes could be a thing of the
past, at least to trigger rebuilds of packages. From a package manager
standpoint they're still needed of course but it opens up the idea of
automation.
In theory these would also make it much easier to tell whether a given
staging package is valid or not rather than all the current messing
around with stamp files and dates.
So the theory is nice, the practicalities of implementing it are less
so.
First, the easy bit. The STAMP variable and directory is still perfectly
fine, we'd just append the checksum onto the STAMP name and the stamp
files would lose the meaning of their time/date. This means most of our
existing hacking on the stamp directory would actually still work.
Bitbake would need to generate these stamps as part of its parsing
process. I'd suggest this is controlled by some metadata variable like
BBCHECKSUMS = "1" turning on this functionality. If enabled, at the end
of the finalise function, the data dictionary would be turned into a
huge text string and a checksum generated of this.
Do we just add everything to this string? That can't work since we have
some paths such as WORKDIR which we don't want to affect the checksum.
We also have variables like DATETIME which change and these probably
shouldn't be reflected in the stamp. So do we blacklist or whitelist?
I'm in favour of blacklisting "bad" variables since its simpler to
maintain and hopefully less error prone. Blacklisting in practical terms
means taking a copy of the data store and setting these variables to
some known value before string expansion. For DATETIME a different type
of blacklisting may be better where the variable is just excluded from
the checksum unless some other variable pulls it in. This could then be
used to our advantage to make the "nostamp" tasks like image generation
always run?
Checksums should really be per task. In a perfect world the checksums
would only include variables in their scope so if PACKAGES changes, only
the packaging task itself would rerun. If you change the do_install
function, only the do_install task would rerun. Is it possible to
achieve this level of functionality? I've spent a while wondering about
this.
For shell tasks I think that it is. The expanded shell function is
exactly what runs and if that script changes, the checksum can change.
The main problem is that we don't currently track which shell functions
depend on which other shell functions. Bitbake can know the list of
possible shell function calls that can be made. Using simple searching
it should be possible to work out who calls which functions relatively
easily. We can provide a mechanism to inject missing dependencies caused
by obfuscated calls we can't detect although I can't think of many of
these offhand. We can easily test this by making exec_task only export
dependent functions instead of currently exporting all shell functions.
That would be a nice improvement in itself anyway.
For python tasks this is harder. I suspect we can find out about
function call dependencies by inspecting the AST. What we can't easily
know is which variables in the data store a given function accesses and
depends upon. We could assume it depends on all datastore variables
unless the function explicitly declares its dependencies (useful for
do_patch which really only depends on SRC_URI)?
So, that is where I'm up to in thinking about this. I'd welcome other
input from people and whether people think this idea is worth pursuing.
I think even a simplistic implementation covering the whole data store
would be better than what we currently have though. As always its worth
exploring how far we can push the model now though and I'm quite
optimistic about what we can achieve!
Cheers,
Richard
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Checksums in Bitbake
2010-03-24 13:27 Checksums in Bitbake Richard Purdie
@ 2010-03-24 15:15 ` Frans Meulenbroeks
2010-03-24 15:51 ` Chris Larson
0 siblings, 1 reply; 10+ messages in thread
From: Frans Meulenbroeks @ 2010-03-24 15:15 UTC (permalink / raw)
To: openembedded-devel; +Cc: bitbake-dev
Richard,
Interesting ideas.
I need to let this digest a little bit.
Some initial thougths
The checksum should also depend on the checksum of the underlying
packages. E.g. if A depends on B and the checksum of B changes it
should trigger a rebuild of A.
A first crude approach would be to have a hash of the concatenation of
the unfolded recipe (so with all includes/requires expanded) and the
hashes of the recipes it depends on). Of course this is very rough as
even changing whitespace in a recipe will lead to a recompile.
A different approach would be to let it depend on PV + PR. That'll put
the developer in control (with all related issues, like the developer
not bumping PR).
And yet a different one would be to use variables and functions from the recipe.
I have mixed feelings on whether checksums also would depend on global
vars (e.g. code generated by the classes or variables in e.g.
local.conf).
On the one hand it seems pretty neat, on the other hand I worry about
performance (calculating the checksum).
Maybe a solution would be to have a hash or so per file and
concatenate all hashes on which a recipe depends (including class
files).
As said, just some initial thoughts.
Hope this helps.
Frans.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Checksums in Bitbake
2010-03-24 15:15 ` Frans Meulenbroeks
@ 2010-03-24 15:51 ` Chris Larson
2010-03-24 20:17 ` Frans Meulenbroeks
0 siblings, 1 reply; 10+ messages in thread
From: Chris Larson @ 2010-03-24 15:51 UTC (permalink / raw)
To: openembedded-devel; +Cc: bitbake-dev
On Wed, Mar 24, 2010 at 8:15 AM, Frans Meulenbroeks <
fransmeulenbroeks@gmail.com> wrote:
> Interesting ideas.
> I need to let this digest a little bit.
>
> Some initial thougths
> The checksum should also depend on the checksum of the underlying
> packages. E.g. if A depends on B and the checksum of B changes it
> should trigger a rebuild of A.
>
I don't think this is a very good idea, personally. As an option, perhaps,
but we do things the way we do for a reason, just because a dep of mine is
rebuilt doesn't automatically require that I be rebuilt. I'd suggest moving
to an alternative which encodes the library ABI and incorporates that into
the hashes of things that depend upon it, but we can certainly do what you
want as an optional feature.
A first crude approach would be to have a hash of the concatenation of
> the unfolded recipe (so with all includes/requires expanded) and the
> hashes of the recipes it depends on). Of course this is very rough as
> even changing whitespace in a recipe will lead to a recompile.
> A different approach would be to let it depend on PV + PR. That'll put
> the developer in control (with all related issues, like the developer
> not bumping PR).
> And yet a different one would be to use variables and functions from the
> recipe.
>
> I have mixed feelings on whether checksums also would depend on global
> vars (e.g. code generated by the classes or variables in e.g.
> local.conf).
> On the one hand it seems pretty neat, on the other hand I worry about
> performance (calculating the checksum).
>
Global variables should absolutely be included, imo. The reason for going
with a blacklist rather than a whitelist approach is to, as richard says,
make it less error prone. It ensures that the failure mode is something
being rebuilt, rather than using possibly incorrect binaries. I'd rather it
take a bit longer to build than result in questionable output. If
calculating the checksum time becomes a concern, which I doubt, you could
hash the configuration metadata at ConfigParsed time and incorporate that
hash into the hash generated of the recipe. This could increase the
likelihood of collisions, but I'm not too worried. Let's get things
working, and determine the bottlenecks at that point.
--
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Checksums in Bitbake
2010-03-24 15:51 ` Chris Larson
@ 2010-03-24 20:17 ` Frans Meulenbroeks
2010-03-24 21:46 ` Richard Purdie
0 siblings, 1 reply; 10+ messages in thread
From: Frans Meulenbroeks @ 2010-03-24 20:17 UTC (permalink / raw)
To: openembedded-devel
2010/3/24 Chris Larson <clarson@kergoth.com>:
> On Wed, Mar 24, 2010 at 8:15 AM, Frans Meulenbroeks <
> fransmeulenbroeks@gmail.com> wrote:
>
>> Interesting ideas.
>> I need to let this digest a little bit.
>>
>> Some initial thougths
>> The checksum should also depend on the checksum of the underlying
>> packages. E.g. if A depends on B and the checksum of B changes it
>> should trigger a rebuild of A.
>>
>
> I don't think this is a very good idea, personally. As an option, perhaps,
> but we do things the way we do for a reason, just because a dep of mine is
> rebuilt doesn't automatically require that I be rebuilt. I'd suggest moving
> to an alternative which encodes the library ABI and incorporates that into
> the hashes of things that depend upon it, but we can certainly do what you
> want as an optional feature.
If a dep is rebuild there is a reason for it (bug fix, packaging
changed, changes in exported files etc etc).
This might impact the using recipe.
If baking a file does not result in a rebuild when a dependency is
changed, probably a warning should be given.
Encoding the library ABI is only part of the job. You'd also have to
take the .h files a package exports into account as constants in it
could be changed.
And even the using package could change its behaviour (e.g. because
configure runs differently).
Note also that if we abandon PR we do not really have an easy
mechanism to force recompilation of a package (if a depenency changed
and we want to force a rebuild).
>
> A first crude approach would be to have a hash of the concatenation of
>> the unfolded recipe (so with all includes/requires expanded) and the
>> hashes of the recipes it depends on). Of course this is very rough as
>> even changing whitespace in a recipe will lead to a recompile.
>> A different approach would be to let it depend on PV + PR. That'll put
>> the developer in control (with all related issues, like the developer
>> not bumping PR).
>> And yet a different one would be to use variables and functions from the
>> recipe.
>>
>> I have mixed feelings on whether checksums also would depend on global
>> vars (e.g. code generated by the classes or variables in e.g.
>> local.conf).
>> On the one hand it seems pretty neat, on the other hand I worry about
>> performance (calculating the checksum).
>>
>
> Global variables should absolutely be included, imo. The reason for going
> with a blacklist rather than a whitelist approach is to, as richard says,
> make it less error prone. It ensures that the failure mode is something
> being rebuilt, rather than using possibly incorrect binaries. I'd rather it
> take a bit longer to build than result in questionable output. If
> calculating the checksum time becomes a concern, which I doubt, you could
> hash the configuration metadata at ConfigParsed time and incorporate that
> hash into the hash generated of the recipe. This could increase the
> likelihood of collisions, but I'm not too worried. Let's get things
> working, and determine the bottlenecks at that point.
Agree, but as changes in vars are less likely we could consider having
something to DISTRO_PR.
My nightmare is that if I am going to build console-image (about 3000
tasks) that it goes to check 3000 times if my TMPDIR is not changed.
Some caching will definitely be needed
(btw if we have a checksum per file and have a rule that if the
checksum is newer than the file, it is ok and need not be recomputed
that could help, but it might bring back some of the issues we now
have with stamp).
Frans
> --
> Christopher Larson
> clarson at kergoth dot com
> Founder - BitBake, OpenEmbedded, OpenZaurus
> Maintainer - Tslib
> Senior Software Engineer, Mentor Graphics
> _______________________________________________
> Openembedded-devel mailing list
> Openembedded-devel@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-devel
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Checksums in Bitbake
2010-03-24 20:17 ` Frans Meulenbroeks
@ 2010-03-24 21:46 ` Richard Purdie
2010-03-25 7:45 ` Frans Meulenbroeks
0 siblings, 1 reply; 10+ messages in thread
From: Richard Purdie @ 2010-03-24 21:46 UTC (permalink / raw)
To: openembedded-devel
On Wed, 2010-03-24 at 21:17 +0100, Frans Meulenbroeks wrote:
> 2010/3/24 Chris Larson <clarson@kergoth.com>:
> > On Wed, Mar 24, 2010 at 8:15 AM, Frans Meulenbroeks <
> > fransmeulenbroeks@gmail.com> wrote:
> >
> >> Interesting ideas.
> >> I need to let this digest a little bit.
> >>
> >> Some initial thougths
> >> The checksum should also depend on the checksum of the underlying
> >> packages. E.g. if A depends on B and the checksum of B changes it
> >> should trigger a rebuild of A.
> >>
> >
> > I don't think this is a very good idea, personally. As an option, perhaps,
> > but we do things the way we do for a reason, just because a dep of mine is
> > rebuilt doesn't automatically require that I be rebuilt. I'd suggest moving
> > to an alternative which encodes the library ABI and incorporates that into
> > the hashes of things that depend upon it, but we can certainly do what you
> > want as an optional feature.
>
> If a dep is rebuild there is a reason for it (bug fix, packaging
> changed, changes in exported files etc etc).
> This might impact the using recipe.
> If baking a file does not result in a rebuild when a dependency is
> changed, probably a warning should be given.
>
> Encoding the library ABI is only part of the job. You'd also have to
> take the .h files a package exports into account as constants in it
> could be changed.
> And even the using package could change its behaviour (e.g. because
> configure runs differently).
> Note also that if we abandon PR we do not really have an easy
> mechanism to force recompilation of a package (if a depenency changed
> and we want to force a rebuild).
Its worth noting you can enable this now with BB_STAMP_POLICY. I see
something similar being the likely outcome with checksums. Some people
will want the full dependency tree, some people won't. We can support
both just as we do now.
> > Global variables should absolutely be included, imo. The reason for going
> > with a blacklist rather than a whitelist approach is to, as richard says,
> > make it less error prone. It ensures that the failure mode is something
> > being rebuilt, rather than using possibly incorrect binaries. I'd rather it
> > take a bit longer to build than result in questionable output. If
> > calculating the checksum time becomes a concern, which I doubt, you could
> > hash the configuration metadata at ConfigParsed time and incorporate that
> > hash into the hash generated of the recipe. This could increase the
> > likelihood of collisions, but I'm not too worried. Let's get things
> > working, and determine the bottlenecks at that point.
>
> Agree, but as changes in vars are less likely we could consider having
> something to DISTRO_PR.
> My nightmare is that if I am going to build console-image (about 3000
> tasks) that it goes to check 3000 times if my TMPDIR is not changed.
> Some caching will definitely be needed
The checksums are likely to be constructed at parsing time. Compared to
the cost of building the datastore I'm hopeful the cost of building the
checksum will be low. The current caching algorithms will rebuild the
checksums when needed just fine.
Cheers,
Richard
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Checksums in Bitbake
2010-03-24 21:46 ` Richard Purdie
@ 2010-03-25 7:45 ` Frans Meulenbroeks
2010-03-27 18:31 ` Reproducible builds (Was Re: Checksums in Bitbake) GNUtoo
0 siblings, 1 reply; 10+ messages in thread
From: Frans Meulenbroeks @ 2010-03-25 7:45 UTC (permalink / raw)
To: openembedded-devel
2010/3/24 Richard Purdie <rpurdie@rpsys.net>:
> On Wed, 2010-03-24 at 21:17 +0100, Frans Meulenbroeks wrote:
>> If a dep is rebuild there is a reason for it (bug fix, packaging
>> changed, changes in exported files etc etc).
>> This might impact the using recipe.
>> If baking a file does not result in a rebuild when a dependency is
>> changed, probably a warning should be given.
>>
>> Encoding the library ABI is only part of the job. You'd also have to
>> take the .h files a package exports into account as constants in it
>> could be changed.
>> And even the using package could change its behaviour (e.g. because
>> configure runs differently).
>> Note also that if we abandon PR we do not really have an easy
>> mechanism to force recompilation of a package (if a depenency changed
>> and we want to force a rebuild).
>
> Its worth noting you can enable this now with BB_STAMP_POLICY. I see
> something similar being the likely outcome with checksums. Some people
> will want the full dependency tree, some people won't. We can support
> both just as we do now.
Supporting both is great.
The reason I brought this up is that for application of OE in
products, reproducability is important.
I've seen too often (also outside OE) that two engineers take the same
source yet still get different results, and that bugs at a customer
site cannot be reproduced in the lab (and yes, I do know there are
other ways to tackle this problem)
The thing I would like to avoid is that different people create
packages/images that look like being the same (e.g. name/ID) yet have
different content.
Or in other words: the problem one could face and that I want to avoid
is that if A depends on B and A is build and works, but afterwards B
is changed and a newly compiled A would have the same name/version but
might behave differently than the old one (e.g. because B changed
something in a .h file).
That could also lead to situations where a local compiled file behaves
differently than the same one you'd get from a feed.
IMHO less desirable.
Frans
^ permalink raw reply [flat|nested] 10+ messages in thread
* Reproducible builds (Was Re: Checksums in Bitbake)
2010-03-25 7:45 ` Frans Meulenbroeks
@ 2010-03-27 18:31 ` GNUtoo
2010-03-27 18:54 ` Tom Rini
2010-03-29 10:42 ` Sander van Grieken
0 siblings, 2 replies; 10+ messages in thread
From: GNUtoo @ 2010-03-27 18:31 UTC (permalink / raw)
To: openembedded-devel
On Thu, 2010-03-25 at 08:45 +0100, Frans Meulenbroeks wrote:
> I've seen too often (also outside OE) that two engineers take the same
> source yet still get different results, and that bugs at a customer
> site cannot be reproduced in the lab (and yes, I do know there are
> other ways to tackle this problem)
Also:
bitbake optionaldep
bitbake package
And:
bitbake package
could result in different binaries/packages due to configure picking
optionaldep in the first case and not in the second one.
Maybe we should start hardcoding --without-optionaldep for all optional
dependencies that are not in DEPENDS?
OR...maybe packaged-staging could save us from that issue?
( http://marcin.juszkiewicz.com.pl/2008/07/01/packaged-staging-and-what-it-gives/ )
Denis.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Reproducible builds (Was Re: Checksums in Bitbake)
2010-03-27 18:31 ` Reproducible builds (Was Re: Checksums in Bitbake) GNUtoo
@ 2010-03-27 18:54 ` Tom Rini
2010-03-27 19:14 ` Koen Kooi
2010-03-29 10:42 ` Sander van Grieken
1 sibling, 1 reply; 10+ messages in thread
From: Tom Rini @ 2010-03-27 18:54 UTC (permalink / raw)
To: openembedded-devel
On Sat, 2010-03-27 at 19:31 +0100, GNUtoo wrote:
> On Thu, 2010-03-25 at 08:45 +0100, Frans Meulenbroeks wrote:
> > I've seen too often (also outside OE) that two engineers take the same
> > source yet still get different results, and that bugs at a customer
> > site cannot be reproduced in the lab (and yes, I do know there are
> > other ways to tackle this problem)
> Also:
> bitbake optionaldep
> bitbake package
> And:
> bitbake package
>
> could result in different binaries/packages due to configure picking
> optionaldep in the first case and not in the second one.
>
> Maybe we should start hardcoding --without-optionaldep for all optional
> dependencies that are not in DEPENDS?
>
> OR...maybe packaged-staging could save us from that issue?
>
> ( http://marcin.juszkiewicz.com.pl/2008/07/01/packaged-staging-and-what-it-gives/ )
pstaging catches the implicit required deps, but not the implicit
optional deps. IMHO it would be nice, and I think there's been a
general OK in this direction, to move towards DISTRO_FEATURES (or so?)
toggling --enable-<feature>. That's what's needed in this particular
case.
--
Tom Rini <tom_rini@mentor.com>
Mentor Graphics Corporation
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Reproducible builds (Was Re: Checksums in Bitbake)
2010-03-27 18:54 ` Tom Rini
@ 2010-03-27 19:14 ` Koen Kooi
0 siblings, 0 replies; 10+ messages in thread
From: Koen Kooi @ 2010-03-27 19:14 UTC (permalink / raw)
To: openembedded-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 27-03-10 19:54, Tom Rini wrote:
> On Sat, 2010-03-27 at 19:31 +0100, GNUtoo wrote:
>> On Thu, 2010-03-25 at 08:45 +0100, Frans Meulenbroeks wrote:
>>> I've seen too often (also outside OE) that two engineers take the same
>>> source yet still get different results, and that bugs at a customer
>>> site cannot be reproduced in the lab (and yes, I do know there are
>>> other ways to tackle this problem)
>> Also:
>> bitbake optionaldep
>> bitbake package
>> And:
>> bitbake package
>>
>> could result in different binaries/packages due to configure picking
>> optionaldep in the first case and not in the second one.
>>
>> Maybe we should start hardcoding --without-optionaldep for all optional
>> dependencies that are not in DEPENDS?
>>
>> OR...maybe packaged-staging could save us from that issue?
>>
>> ( http://marcin.juszkiewicz.com.pl/2008/07/01/packaged-staging-and-what-it-gives/ )
>
> pstaging catches the implicit required deps, but not the implicit
> optional deps. IMHO it would be nice, and I think there's been a
> general OK in this direction, to move towards DISTRO_FEATURES (or so?)
> toggling --enable-<feature>. That's what's needed in this particular
> case.
DISTRO_FEATURE should only be used as last resort, per-recipe staging
would be better :)
regards,
Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD8DBQFLrlkjMkyGM64RGpERAg1jAJ4y8sgtx9IDeGxndv0SedzgpewLVwCggGTl
S6eFOiavvxEeRl3qlDQuwZk=
=JyEG
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Reproducible builds (Was Re: Checksums in Bitbake)
2010-03-27 18:31 ` Reproducible builds (Was Re: Checksums in Bitbake) GNUtoo
2010-03-27 18:54 ` Tom Rini
@ 2010-03-29 10:42 ` Sander van Grieken
1 sibling, 0 replies; 10+ messages in thread
From: Sander van Grieken @ 2010-03-29 10:42 UTC (permalink / raw)
To: openembedded-devel
On Saturday 27 March 2010 19:31:49 GNUtoo wrote:
> On Thu, 2010-03-25 at 08:45 +0100, Frans Meulenbroeks wrote:
> > I've seen too often (also outside OE) that two engineers take the same
> > source yet still get different results, and that bugs at a customer
> > site cannot be reproduced in the lab (and yes, I do know there are
> > other ways to tackle this problem)
> Also:
> bitbake optionaldep
> bitbake package
> And:
> bitbake package
>
> could result in different binaries/packages due to configure picking
> optionaldep in the first case and not in the second one.
So, building any of the *-image recipes is non-deterministic, if it pulls in optional deps
for other recipes? It then comes down to the build-queue ordering, which is probably not
guaranteed to have a predictable order, especially with a high number of parallel bitbake
threads.
Sounds like there's a need for something similar to Gentoo's USE flags.
grtz,
Sander
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-03-29 10:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-24 13:27 Checksums in Bitbake Richard Purdie
2010-03-24 15:15 ` Frans Meulenbroeks
2010-03-24 15:51 ` Chris Larson
2010-03-24 20:17 ` Frans Meulenbroeks
2010-03-24 21:46 ` Richard Purdie
2010-03-25 7:45 ` Frans Meulenbroeks
2010-03-27 18:31 ` Reproducible builds (Was Re: Checksums in Bitbake) GNUtoo
2010-03-27 18:54 ` Tom Rini
2010-03-27 19:14 ` Koen Kooi
2010-03-29 10:42 ` Sander van Grieken
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.