* Should stamp files for different versions of a recipe exist at the same time?
@ 2025-05-30 17:17 Mike Crowe
2025-06-02 10:15 ` [bitbake-devel] " Quentin Schulz
0 siblings, 1 reply; 14+ messages in thread
From: Mike Crowe @ 2025-05-30 17:17 UTC (permalink / raw)
To: bitbake-devel
I seem to be running into a problem in Scarthgap that relates to outdated
stamp files not being removed. I think this results in tasks not running
when they should in later builds. I don't believe that this problem happens
with Dunfell[1], though it does still happen with current master (both
Bitbake and openembedded-core) too[2].
In case my description below isn't clear, the files required are available
in
https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
though
https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
is probably the most interesting to look at.
My reproduction requires a package to have two different recipes with
different COMPATIBLE_MACHINEs and swapping between building for those two
different MACHINEs. I'm using dummy recipes called lictest_one.bb and
lictest_two.bb because I'm able to reproduce this problem with licence
files, though I suspect that it is not limited to them. I have two machines
named qemuarm64 and qemuarm64b.
# PREPARATION
I first start by running the cleansstate task for the recipe for both
MACHINEs in order to ensure we're starting from a sensible state:
MACHINE=qemuarm64 bitbake -c cleansstate lictest
MACHINE=qemuarm64b bitbake -c cleansstate lictest
# FIRST MACHINE BUILD 1
I build for the first machine:
MACHINE=qemuarm64 bitbake core-image-minimal
At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
stamp files only from lictest_one.bb as expected.
# SECOND MACHINE BUILD 1
I build for the second machine:
MACHINE=qemuarm64b bitbake lictest
At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
licence files from lictest_two.bb as expected.
tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains stamp files from both
recipes. This is unexpected to me as I would have expected the stamps from
lictest_one.bb to be removed due to them being unreachable.
# FIRST MACHINE BUILD 2
I go back to build for the first machine again with some debug output:
MACHINE=qemuarm64 bitbake -DDD lictest
In that debug output I found:
DEBUG: Stampfile /fast/mac/git/oe-core/build/tmp-glibc/stamps/cortexa57-oe-linux/lictest/1.do_populate_lic_setscene.eba2f40dbe8904031c070aeff35abee46485bfa827e27731e02e8c5a84ef3a46 not available
DEBUG: Normal stamp current for task /fast/mac/git/oe-core/meta/recipes-core/lictest/lictest_one.bb:do_populate_lic
DEBUG: Found task /fast/mac/git/oe-core/meta/recipes-core/lictest/lictest_one.bb:do_populate_lic which could be accelerated
DEBUG: /fast/mac/git/oe-core/meta/recipes-core/lictest/lictest_one.bb:do_populate_lic has a valid stamp, skipping
/fast/mac/git/oe-core/meta/recipes-core/lictest/lictest_one.bb:do_populate_lic
DEBUG: Marking task /fast/mac/git/oe-core/meta/recipes-core/lictest/lictest_one.bb:do_populate_lic as buildable
DEBUG: Setscene covered task /fast/mac/git/oe-core/meta/recipes-core/lictest/lictest_one.bb:do_populate_lic
and tmp-glibc/deploy/licenses/cortexa57/lictest is either empty or it
contains the licence files from lictest_two! The stamps from lictest_two.bb
have been removed from tmp-glibc/stamps/cortexa57-oe-linux/lictest/, but
the ones from lictest_one.bb remain.
My guess is that the problem here is that the stamps from the first machine
build weren't removed during the "SECOND MACHINE BUILD 1" step above. If I
remove them myself then the problem goes away. Is that theory correct? If
so, then I can start trying to work out why and any advice would be
welcome. If that theory is not correct then does anyone have any idea where
I should start investigating?
Thanks.
Mike.
[1] But with the fix for
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14123 cherry-picked.
[2] If the reproduction script is tweaked to replace tmp-glibc with tmp.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-05-30 17:17 Should stamp files for different versions of a recipe exist at the same time? Mike Crowe
@ 2025-06-02 10:15 ` Quentin Schulz
2025-06-02 10:49 ` Mike Crowe
0 siblings, 1 reply; 14+ messages in thread
From: Quentin Schulz @ 2025-06-02 10:15 UTC (permalink / raw)
To: mac, bitbake-devel
Hi Mike,
On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> I seem to be running into a problem in Scarthgap that relates to outdated
> stamp files not being removed. I think this results in tasks not running
> when they should in later builds. I don't believe that this problem happens
> with Dunfell[1], though it does still happen with current master (both
> Bitbake and openembedded-core) too[2].
>
> In case my description below isn't clear, the files required are available
> in
> https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> though
> https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> is probably the most interesting to look at.
>
> My reproduction requires a package to have two different recipes with
> different COMPATIBLE_MACHINEs and swapping between building for those two
> different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> lictest_two.bb because I'm able to reproduce this problem with licence
> files, though I suspect that it is not limited to them. I have two machines
> named qemuarm64 and qemuarm64b.
>
> # PREPARATION
>
> I first start by running the cleansstate task for the recipe for both
> MACHINEs in order to ensure we're starting from a sensible state:
>
> MACHINE=qemuarm64 bitbake -c cleansstate lictest
> MACHINE=qemuarm64b bitbake -c cleansstate lictest
>
> # FIRST MACHINE BUILD 1
>
> I build for the first machine:
>
> MACHINE=qemuarm64 bitbake core-image-minimal
>
> At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
Intuitively I would say that your recipes also need to have PACKAGE_ARCH
= "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I
believe (because different directories for each recipe). I'm not
entirely sure what is the recommendation for when to start using
PACKAGE_ARCH = "${MACHINE_ARCH}", like what's the thing that should
trigger this addition in the recipe?
I'm also not saying that this isn't a bug or a bigger issue, even if my
suggestion may work around the issue.
Cheers,
Quentin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-02 10:15 ` [bitbake-devel] " Quentin Schulz
@ 2025-06-02 10:49 ` Mike Crowe
2025-06-02 14:17 ` Richard Purdie
0 siblings, 1 reply; 14+ messages in thread
From: Mike Crowe @ 2025-06-02 10:49 UTC (permalink / raw)
To: Quentin Schulz, bitbake-devel
On Monday 02 June 2025 at 12:15:57 +0200, Quentin Schulz wrote:
> Hi Mike,
>
> On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > I seem to be running into a problem in Scarthgap that relates to outdated
> > stamp files not being removed. I think this results in tasks not running
> > when they should in later builds. I don't believe that this problem happens
> > with Dunfell[1], though it does still happen with current master (both
> > Bitbake and openembedded-core) too[2].
> >
> > In case my description below isn't clear, the files required are available
> > in
> > https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> > though
> > https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> > is probably the most interesting to look at.
> >
> > My reproduction requires a package to have two different recipes with
> > different COMPATIBLE_MACHINEs and swapping between building for those two
> > different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> > lictest_two.bb because I'm able to reproduce this problem with licence
> > files, though I suspect that it is not limited to them. I have two machines
> > named qemuarm64 and qemuarm64b.
> >
> > # PREPARATION
> >
> > I first start by running the cleansstate task for the recipe for both
> > MACHINEs in order to ensure we're starting from a sensible state:
> >
> > MACHINE=qemuarm64 bitbake -c cleansstate lictest
> > MACHINE=qemuarm64b bitbake -c cleansstate lictest
> >
> > # FIRST MACHINE BUILD 1
> >
> > I build for the first machine:
> >
> > MACHINE=qemuarm64 bitbake core-image-minimal
> >
> > At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> > licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
>
> Intuitively I would say that your recipes also need to have PACKAGE_ARCH =
> "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I believe
> (because different directories for each recipe). I'm not entirely sure what
> is the recommendation for when to start using PACKAGE_ARCH =
> "${MACHINE_ARCH}", like what's the thing that should trigger this addition
> in the recipe?
Hi Quentin,
Thanks. I had been holding that back as a somewhat-hacky solution to the
problem. It would reduce the amount setscene tasks running when switching
though.
In reality the situation is more complex than the one I described above. We
actually have multiple machines and two sets of recipes. Some machines use
one set of recipes and some use the other set. Using PACKAGE_ARCH =
"${MACHINE_ARCH}" might work, but it would result in more builds than
necessary. I had considered changing setting PACKAGE_ARCH =
"${TUNE_PKGARCH}-suffix" or PACKAGE_ARCH:append = "-suffix" or similar
would work, but I haven't tested that.
It's also not been clear to me where the divide between PACKAGE_ARCH =
"${TUNE_PKGARCH}" and PACKAGE_ARCH = "${MACHINE_ARCH}" is. There are lots
of ways (e.g. setting PACKAGECONFIG:pn-package) that could cause a
PACKAGE_ARCH = "${TUNE_PKGARCH}"-specific recipe to vary between MACHINEs.
> I'm also not saying that this isn't a bug or a bigger issue, even if my
> suggestion may work around the issue.
Thanks for the suggestion. If Dunfell had behaved as Scarthgap or master
currently do then we probably would have just considered that to be the
expected behaviour.
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-02 10:49 ` Mike Crowe
@ 2025-06-02 14:17 ` Richard Purdie
2025-06-02 16:37 ` Mike Crowe
0 siblings, 1 reply; 14+ messages in thread
From: Richard Purdie @ 2025-06-02 14:17 UTC (permalink / raw)
To: mac, Quentin Schulz, bitbake-devel
On Mon, 2025-06-02 at 11:49 +0100, Mike Crowe via lists.openembedded.org wrote:
> On Monday 02 June 2025 at 12:15:57 +0200, Quentin Schulz wrote:
> > Hi Mike,
> >
> > On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > I seem to be running into a problem in Scarthgap that relates to outdated
> > > stamp files not being removed. I think this results in tasks not running
> > > when they should in later builds. I don't believe that this problem happens
> > > with Dunfell[1], though it does still happen with current master (both
> > > Bitbake and openembedded-core) too[2].
> > >
> > > In case my description below isn't clear, the files required are available
> > > in
> > > https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> > > though
> > > https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> > > is probably the most interesting to look at.
> > >
> > > My reproduction requires a package to have two different recipes with
> > > different COMPATIBLE_MACHINEs and swapping between building for those two
> > > different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> > > lictest_two.bb because I'm able to reproduce this problem with licence
> > > files, though I suspect that it is not limited to them. I have two machines
> > > named qemuarm64 and qemuarm64b.
> > >
> > > # PREPARATION
> > >
> > > I first start by running the cleansstate task for the recipe for both
> > > MACHINEs in order to ensure we're starting from a sensible state:
> > >
> > > MACHINE=qemuarm64 bitbake -c cleansstate lictest
> > > MACHINE=qemuarm64b bitbake -c cleansstate lictest
> > >
> > > # FIRST MACHINE BUILD 1
> > >
> > > I build for the first machine:
> > >
> > > MACHINE=qemuarm64 bitbake core-image-minimal
> > >
> > > At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> > > licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
> >
> > Intuitively I would say that your recipes also need to have PACKAGE_ARCH =
> > "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I believe
> > (because different directories for each recipe). I'm not entirely sure what
> > is the recommendation for when to start using PACKAGE_ARCH =
> > "${MACHINE_ARCH}", like what's the thing that should trigger this addition
> > in the recipe?
>
> Hi Quentin,
>
> Thanks. I had been holding that back as a somewhat-hacky solution to the
> problem. It would reduce the amount setscene tasks running when switching
> though.
>
> In reality the situation is more complex than the one I described above. We
> actually have multiple machines and two sets of recipes. Some machines use
> one set of recipes and some use the other set. Using PACKAGE_ARCH =
> "${MACHINE_ARCH}" might work, but it would result in more builds than
> necessary. I had considered changing setting PACKAGE_ARCH =
> "${TUNE_PKGARCH}-suffix" or PACKAGE_ARCH:append = "-suffix" or similar
> would work, but I haven't tested that.
>
> It's also not been clear to me where the divide between PACKAGE_ARCH =
> "${TUNE_PKGARCH}" and PACKAGE_ARCH = "${MACHINE_ARCH}" is. There are lots
> of ways (e.g. setting PACKAGECONFIG:pn-package) that could cause a
> PACKAGE_ARCH = "${TUNE_PKGARCH}"-specific recipe to vary between MACHINEs.
>
> > I'm also not saying that this isn't a bug or a bigger issue, even if my
> > suggestion may work around the issue.
>
> Thanks for the suggestion. If Dunfell had behaved as Scarthgap or master
> currently do then we probably would have just considered that to be the
> expected behaviour.
I have a feeling there were some changes in this area to fix bugs but
I'm not remembering specifics. It sounds like some issue could have
crept in.
What you're supposed to be able to do is define your own package
architectures so that you could have two here, one for each of your
configs. You could then switch between them and the appropriate
packages would be used.
As it stands, you're relying on the sstate code to swap things out,
which is meant to be a fallback, not a default way of operating. Far
too many people rely on that doing the right thing now though.
So I'd say there is probably a bug somewhere here but you're also not
using the system quite as intended.
Thanks for a test case btw, that really helps. I will try and take a
look if I can find time but that is hard atm :/.
Cheers,
Richard
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-02 14:17 ` Richard Purdie
@ 2025-06-02 16:37 ` Mike Crowe
2025-06-02 21:04 ` Richard Purdie
0 siblings, 1 reply; 14+ messages in thread
From: Mike Crowe @ 2025-06-02 16:37 UTC (permalink / raw)
To: Richard Purdie, bitbake-devel
On Monday 02 June 2025 at 15:17:44 +0100, Richard Purdie wrote:
> On Mon, 2025-06-02 at 11:49 +0100, Mike Crowe via lists.openembedded.org wrote:
> > On Monday 02 June 2025 at 12:15:57 +0200, Quentin Schulz wrote:
> > > Hi Mike,
> > >
> > > On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > > I seem to be running into a problem in Scarthgap that relates to outdated
> > > > stamp files not being removed. I think this results in tasks not running
> > > > when they should in later builds. I don't believe that this problem happens
> > > > with Dunfell[1], though it does still happen with current master (both
> > > > Bitbake and openembedded-core) too[2].
> > > >
> > > > In case my description below isn't clear, the files required are available
> > > > in
> > > > https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> > > > though
> > > > https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> > > > is probably the most interesting to look at.
> > > >
> > > > My reproduction requires a package to have two different recipes with
> > > > different COMPATIBLE_MACHINEs and swapping between building for those two
> > > > different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> > > > lictest_two.bb because I'm able to reproduce this problem with licence
> > > > files, though I suspect that it is not limited to them. I have two machines
> > > > named qemuarm64 and qemuarm64b.
> > > >
> > > > # PREPARATION
> > > >
> > > > I first start by running the cleansstate task for the recipe for both
> > > > MACHINEs in order to ensure we're starting from a sensible state:
> > > >
> > > > � MACHINE=qemuarm64 bitbake -c cleansstate lictest
> > > > � MACHINE=qemuarm64b bitbake -c cleansstate lictest
> > > >
> > > > # FIRST MACHINE BUILD 1
> > > >
> > > > I build for the first machine:
> > > >
> > > > � MACHINE=qemuarm64 bitbake core-image-minimal
> > > >
> > > > At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> > > > licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
> > >
> > > Intuitively I would say that your recipes also need to have PACKAGE_ARCH =
> > > "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I believe
> > > (because different directories for each recipe). I'm not entirely sure what
> > > is the recommendation for when to start using PACKAGE_ARCH =
> > > "${MACHINE_ARCH}", like what's the thing that should trigger this addition
> > > in the recipe?
> >
> > Hi Quentin,
> >
> > Thanks. I had been holding that back as a somewhat-hacky solution to the
> > problem. It would reduce the amount setscene tasks running when switching
> > though.
> >
> > In reality the situation is more complex than the one I described above. We
> > actually have multiple machines and two sets of recipes. Some machines use
> > one set of recipes and some use the other set. Using PACKAGE_ARCH =
> > "${MACHINE_ARCH}" might work, but it would result in more builds than
> > necessary. I had considered changing setting PACKAGE_ARCH =
> > "${TUNE_PKGARCH}-suffix" or PACKAGE_ARCH:append = "-suffix" or similar
> > would work, but I haven't tested that.
> >
> > It's also not been clear to me where the divide between PACKAGE_ARCH =
> > "${TUNE_PKGARCH}" and PACKAGE_ARCH = "${MACHINE_ARCH}" is. There are lots
> > of ways (e.g. setting PACKAGECONFIG:pn-package) that could cause a
> > PACKAGE_ARCH = "${TUNE_PKGARCH}"-specific recipe to vary between MACHINEs.
> >
> > > I'm also not saying that this isn't a bug or a bigger issue, even if my
> > > suggestion may work around the issue.
> >
> > Thanks for the suggestion. If Dunfell had behaved as Scarthgap or master
> > currently do then we probably would have just considered that to be the
> > expected behaviour.
>
> I have a feeling there were some changes in this area to fix bugs but
> I'm not remembering specifics. It sounds like some issue could have
> crept in.
>
> What you're supposed to be able to do is define your own package
> architectures so that you could have two here, one for each of your
> configs. You could then switch between them and the appropriate
> packages would be used.
The trouble with this is it requires everything that depends on these
packages to also use those package architectures otherwise the problem just
moves up a level. Recipes changing for the same PACKAGE_ARCH is also not
very discoverable in the general case (though in this one it is quite
obvious).
> As it stands, you're relying on the sstate code to swap things out,
> which is meant to be a fallback, not a default way of operating. Far
> too many people rely on that doing the right thing now though.
Our situation is actually even worse than I described above. One set of
machines has multilib enabled and one doesn't, yet they share a
TUNE_PKGARCH so the sstate shuffling happens for every package! Given your
response it sounds like we should stop doing this.
How would you recommend defining our own package architecture in this case?
1. TUNE_PKGARCH:append = "-oursuffix"? (Which would also affect
SSTATE_ARCHS_TUNEPKG.)
2. Set the default PACKAGE_ARCH ?= "${TUNE_PKGARCH}-oursuffix" overriding
the ??= default in bitbake.conf unless someone else overrides it?
3. Something else?
(I had a look through the docs but couldn't find any advice and the
multilib examples don't appear to do this.)
> So I'd say there is probably a bug somewhere here but you're also not
> using the system quite as intended.
> Thanks for a test case btw, that really helps. I will try and take a
> look if I can find time but that is hard atm :/.
In my original message I wrote:
>>> My guess is that the problem here is that the stamps from the first
>>> machine build weren't removed during the "SECOND MACHINE BUILD 1" step
>>> above. If I remove them myself then the problem goes away. Is that
>>> theory correct? If so, then I can start trying to work out why and any
>>> advice would be welcome. If that theory is not correct then does anyone
>>> have any idea where I should start investigating?
I was hoping that someone would just know the answer to my first question
above off the top of their head. If so then I'm willing to try digging into
this myself to see what I can discover. I just need to make sure that I'm
chasing the right part of the problem.
Thanks.
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-02 16:37 ` Mike Crowe
@ 2025-06-02 21:04 ` Richard Purdie
2025-06-03 10:07 ` Mike Crowe
2025-06-08 19:20 ` Mike Crowe
0 siblings, 2 replies; 14+ messages in thread
From: Richard Purdie @ 2025-06-02 21:04 UTC (permalink / raw)
To: Mike Crowe, bitbake-devel
On Mon, 2025-06-02 at 17:37 +0100, Mike Crowe wrote:
> On Monday 02 June 2025 at 15:17:44 +0100, Richard Purdie wrote:
> > On Mon, 2025-06-02 at 11:49 +0100, Mike Crowe via lists.openembedded.org wrote:
> > > On Monday 02 June 2025 at 12:15:57 +0200, Quentin Schulz wrote:
> > > > Hi Mike,
> > > >
> > > > On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > > > I seem to be running into a problem in Scarthgap that relates to outdated
> > > > > stamp files not being removed. I think this results in tasks not running
> > > > > when they should in later builds. I don't believe that this problem happens
> > > > > with Dunfell[1], though it does still happen with current master (both
> > > > > Bitbake and openembedded-core) too[2].
> > > > >
> > > > > In case my description below isn't clear, the files required are available
> > > > > in
> > > > > https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> > > > > though
> > > > > https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> > > > > is probably the most interesting to look at.
> > > > >
> > > > > My reproduction requires a package to have two different recipes with
> > > > > different COMPATIBLE_MACHINEs and swapping between building for those two
> > > > > different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> > > > > lictest_two.bb because I'm able to reproduce this problem with licence
> > > > > files, though I suspect that it is not limited to them. I have two machines
> > > > > named qemuarm64 and qemuarm64b.
> > > > >
> > > > > # PREPARATION
> > > > >
> > > > > I first start by running the cleansstate task for the recipe for both
> > > > > MACHINEs in order to ensure we're starting from a sensible state:
> > > > >
> > > > > MACHINE=qemuarm64 bitbake -c cleansstate lictest
> > > > > MACHINE=qemuarm64b bitbake -c cleansstate lictest
> > > > >
> > > > > # FIRST MACHINE BUILD 1
> > > > >
> > > > > I build for the first machine:
> > > > >
> > > > > MACHINE=qemuarm64 bitbake core-image-minimal
> > > > >
> > > > > At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> > > > > licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
> > > >
> > > > Intuitively I would say that your recipes also need to have PACKAGE_ARCH =
> > > > "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I believe
> > > > (because different directories for each recipe). I'm not entirely sure what
> > > > is the recommendation for when to start using PACKAGE_ARCH =
> > > > "${MACHINE_ARCH}", like what's the thing that should trigger this addition
> > > > in the recipe?
> > >
> > > Hi Quentin,
> > >
> > > Thanks. I had been holding that back as a somewhat-hacky solution to the
> > > problem. It would reduce the amount setscene tasks running when switching
> > > though.
> > >
> > > In reality the situation is more complex than the one I described above. We
> > > actually have multiple machines and two sets of recipes. Some machines use
> > > one set of recipes and some use the other set. Using PACKAGE_ARCH =
> > > "${MACHINE_ARCH}" might work, but it would result in more builds than
> > > necessary. I had considered changing setting PACKAGE_ARCH =
> > > "${TUNE_PKGARCH}-suffix" or PACKAGE_ARCH:append = "-suffix" or similar
> > > would work, but I haven't tested that.
> > >
> > > It's also not been clear to me where the divide between PACKAGE_ARCH =
> > > "${TUNE_PKGARCH}" and PACKAGE_ARCH = "${MACHINE_ARCH}" is. There are lots
> > > of ways (e.g. setting PACKAGECONFIG:pn-package) that could cause a
> > > PACKAGE_ARCH = "${TUNE_PKGARCH}"-specific recipe to vary between MACHINEs.
> > >
> > > > I'm also not saying that this isn't a bug or a bigger issue, even if my
> > > > suggestion may work around the issue.
> > >
> > > Thanks for the suggestion. If Dunfell had behaved as Scarthgap or master
> > > currently do then we probably would have just considered that to be the
> > > expected behaviour.
> >
> > I have a feeling there were some changes in this area to fix bugs but
> > I'm not remembering specifics. It sounds like some issue could have
> > crept in.
> >
> > What you're supposed to be able to do is define your own package
> > architectures so that you could have two here, one for each of your
> > configs. You could then switch between them and the appropriate
> > packages would be used.
>
> The trouble with this is it requires everything that depends on these
> packages to also use those package architectures otherwise the problem just
> moves up a level. Recipes changing for the same PACKAGE_ARCH is also not
> very discoverable in the general case (though in this one it is quite
> obvious).
You've changed the configuration for them so it is only right they
should have a different PACKAGE_ARCH. This kind of thing does ripple
badly but that is the implication of a change like this.
Let me explain this slightly differently. We once decided that it was
ok to change MACHINE and build in the same TMPDIR (multimachine
support). It was decided that changing DISTRO still meant a new TMPDIR
though. The implication of this is that if you change the
configuration/policy of a package (which is effectively distro policy),
it should be in a different TMPDIR.
Over time, our error correction handling got better, to the point that
if you don't change TMPDIR but change DISTRO, it does mostly work. The
key bit is mostly. Ideally we'd catch all cases and we don't do badly
but it sounds like we're missing something here.
The way to officially handle what you describe is through PACKAGE_ARCH,
even if that is a bit painful (or separate TMPDIR).
> > As it stands, you're relying on the sstate code to swap things out,
> > which is meant to be a fallback, not a default way of operating. Far
> > too many people rely on that doing the right thing now though.
>
> Our situation is actually even worse than I described above. One set of
> machines has multilib enabled and one doesn't, yet they share a
> TUNE_PKGARCH so the sstate shuffling happens for every package! Given your
> response it sounds like we should stop doing this.
>
> How would you recommend defining our own package architecture in this case?
>
> 1. TUNE_PKGARCH:append = "-oursuffix"? (Which would also affect
> SSTATE_ARCHS_TUNEPKG.)
You want sstate arch to be different so this is probably fine.
> 2. Set the default PACKAGE_ARCH ?= "${TUNE_PKGARCH}-oursuffix" overriding
> the ??= default in bitbake.conf unless someone else overrides it?
That would change all recipes and you should only change the recipes
affected and their dependencies.
If it is all recipes, you want a separate TMPDIR and it is really a
different distro (or sub-distro).
> 3. Something else?
>
> (I had a look through the docs but couldn't find any advice and the
> multilib examples don't appear to do this.)
FWIW multilib changes PACKAGE_ARCH.
> > So I'd say there is probably a bug somewhere here but you're also not
> > using the system quite as intended.
>
> > Thanks for a test case btw, that really helps. I will try and take a
> > look if I can find time but that is hard atm :/.
>
> In my original message I wrote:
> > > > My guess is that the problem here is that the stamps from the first
> > > > machine build weren't removed during the "SECOND MACHINE BUILD 1" step
> > > > above. If I remove them myself then the problem goes away. Is that
> > > > theory correct? If so, then I can start trying to work out why and any
> > > > advice would be welcome. If that theory is not correct then does anyone
> > > > have any idea where I should start investigating?
>
> I was hoping that someone would just know the answer to my first question
> above off the top of their head. If so then I'm willing to try digging into
> this myself to see what I can discover. I just need to make sure that I'm
> chasing the right part of the problem.
I've avoided answering this as I think you're right but I'm not 100%
sure and I don't want to send you off in the wrong direction. There are
reasons it might not wipe out stamps and those reasons may be
intentional but without looking at what is going on I couldn't be
completely sure. So I think you're right and I'd chase that way but...
Cheers,
Richard
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-02 21:04 ` Richard Purdie
@ 2025-06-03 10:07 ` Mike Crowe
2025-06-03 10:18 ` Richard Purdie
2025-06-08 19:20 ` Mike Crowe
1 sibling, 1 reply; 14+ messages in thread
From: Mike Crowe @ 2025-06-03 10:07 UTC (permalink / raw)
To: Richard Purdie, bitbake-devel
On Monday 02 June 2025 at 22:04:39 +0100, Richard Purdie wrote:
> On Mon, 2025-06-02 at 17:37 +0100, Mike Crowe wrote:
> > On Monday 02 June 2025 at 15:17:44 +0100, Richard Purdie wrote:
> > > On Mon, 2025-06-02 at 11:49 +0100, Mike Crowe via lists.openembedded.org wrote:
> > > > On Monday 02 June 2025 at 12:15:57 +0200, Quentin Schulz wrote:
> > > > > Hi Mike,
> > > > >
> > > > > On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > > > > I seem to be running into a problem in Scarthgap that relates to outdated
> > > > > > stamp files not being removed. I think this results in tasks not running
> > > > > > when they should in later builds. I don't believe that this problem happens
> > > > > > with Dunfell[1], though it does still happen with current master (both
> > > > > > Bitbake and openembedded-core) too[2].
> > > > > >
> > > > > > In case my description below isn't clear, the files required are available
> > > > > > in
> > > > > > https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> > > > > > though
> > > > > > https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> > > > > > is probably the most interesting to look at.
> > > > > >
> > > > > > My reproduction requires a package to have two different recipes with
> > > > > > different COMPATIBLE_MACHINEs and swapping between building for those two
> > > > > > different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> > > > > > lictest_two.bb because I'm able to reproduce this problem with licence
> > > > > > files, though I suspect that it is not limited to them. I have two machines
> > > > > > named qemuarm64 and qemuarm64b.
> > > > > >
> > > > > > # PREPARATION
> > > > > >
> > > > > > I first start by running the cleansstate task for the recipe for both
> > > > > > MACHINEs in order to ensure we're starting from a sensible state:
> > > > > >
> > > > > > � MACHINE=qemuarm64 bitbake -c cleansstate lictest
> > > > > > � MACHINE=qemuarm64b bitbake -c cleansstate lictest
> > > > > >
> > > > > > # FIRST MACHINE BUILD 1
> > > > > >
> > > > > > I build for the first machine:
> > > > > >
> > > > > > � MACHINE=qemuarm64 bitbake core-image-minimal
> > > > > >
> > > > > > At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> > > > > > licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
> > > > >
> > > > > Intuitively I would say that your recipes also need to have PACKAGE_ARCH =
> > > > > "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I believe
> > > > > (because different directories for each recipe). I'm not entirely sure what
> > > > > is the recommendation for when to start using PACKAGE_ARCH =
> > > > > "${MACHINE_ARCH}", like what's the thing that should trigger this addition
> > > > > in the recipe?
> > > >
> > > > Hi Quentin,
> > > >
> > > > Thanks. I had been holding that back as a somewhat-hacky solution to the
> > > > problem. It would reduce the amount setscene tasks running when switching
> > > > though.
> > > >
> > > > In reality the situation is more complex than the one I described above. We
> > > > actually have multiple machines and two sets of recipes. Some machines use
> > > > one set of recipes and some use the other set. Using PACKAGE_ARCH =
> > > > "${MACHINE_ARCH}" might work, but it would result in more builds than
> > > > necessary. I had considered changing setting PACKAGE_ARCH =
> > > > "${TUNE_PKGARCH}-suffix" or PACKAGE_ARCH:append = "-suffix" or similar
> > > > would work, but I haven't tested that.
> > > >
> > > > It's also not been clear to me where the divide between PACKAGE_ARCH =
> > > > "${TUNE_PKGARCH}" and PACKAGE_ARCH = "${MACHINE_ARCH}" is. There are lots
> > > > of ways (e.g. setting PACKAGECONFIG:pn-package) that could cause a
> > > > PACKAGE_ARCH = "${TUNE_PKGARCH}"-specific recipe to vary between MACHINEs.
> > > >
> > > > > I'm also not saying that this isn't a bug or a bigger issue, even if my
> > > > > suggestion may work around the issue.
> > > >
> > > > Thanks for the suggestion. If Dunfell had behaved as Scarthgap or master
> > > > currently do then we probably would have just considered that to be the
> > > > expected behaviour.
> > >
> > > I have a feeling there were some changes in this area to fix bugs but
> > > I'm not remembering specifics. It sounds like some issue could have
> > > crept in.
> > >
> > > What you're supposed to be able to do is define your own package
> > > architectures so that you could have two here, one for each of your
> > > configs. You could then switch between them and the appropriate
> > > packages would be used.
> >
> > The trouble with this is it requires everything that depends on these
> > packages to also use those package architectures otherwise the problem just
> > moves up a level. Recipes changing for the same PACKAGE_ARCH is also not
> > very discoverable in the general case (though in this one it is quite
> > obvious).
>
> You've changed the configuration for them so it is only right they
> should have a different PACKAGE_ARCH. This kind of thing does ripple
> badly but that is the implication of a change like this.
I agree. I just worry that if nothing breaks most of the time when we fail
to set the different PACKAGE_ARCH on a reverse dependency then we'll not
notice the mistake. Maybe that doesn't matter until it breaks though?
The recipes involved in our case are gstreamer ones, which are depended
upon by many other random recipes in other layers. We _can_ add a .bbappend
for each of them that we use, but there's always a risk that we'll not
notice when someone adds a new dependency. Maybe we could come up with some
tooling to detect the situation?
> Let me explain this slightly differently. We once decided that it was
> ok to change MACHINE and build in the same TMPDIR (multimachine
> support). It was decided that changing DISTRO still meant a new TMPDIR
> though. The implication of this is that if you change the
> configuration/policy of a package (which is effectively distro policy),
> it should be in a different TMPDIR.
>
> Over time, our error correction handling got better, to the point that
> if you don't change TMPDIR but change DISTRO, it does mostly work. The
> key bit is mostly. Ideally we'd catch all cases and we don't do badly
> but it sounds like we're missing something here.
At least in Dunfell, and probably for a while before that, I think that you
did such a good job that my expectations had been that this worked.
> The way to officially handle what you describe is through PACKAGE_ARCH,
> even if that is a bit painful (or separate TMPDIR).
>
> > > As it stands, you're relying on the sstate code to swap things out,
> > > which is meant to be a fallback, not a default way of operating. Far
> > > too many people rely on that doing the right thing now though.
> >
> > Our situation is actually even worse than I described above. One set of
> > machines has multilib enabled and one doesn't, yet they share a
> > TUNE_PKGARCH so the sstate shuffling happens for every package! Given your
> > response it sounds like we should stop doing this.
I shouldn't have said 'every package' there. I should have said 'every
package where PACKAGE_ARCH = "${TUNE_PKGARCH}"'. In particular
${MACHINE_ARCH}, native and allarch packages don't need shuffling.
> > How would you recommend defining our own package architecture in this case?
> >
> > 1. TUNE_PKGARCH:append = "-oursuffix"? (Which would also affect
> > �� SSTATE_ARCHS_TUNEPKG.)
>
> You want sstate arch to be different so this is probably fine.
OK. I did run into some problems when I tried it. I think that option 2
will probably be better anyway.
> > 2. Set the default PACKAGE_ARCH ?= "${TUNE_PKGARCH}-oursuffix" overriding
> > �� the ??= default in bitbake.conf unless someone else overrides it?
>
> That would change all recipes and you should only change the recipes
> affected and their dependencies.
>
> If it is all recipes, you want a separate TMPDIR and it is really a
> different distro (or sub-distro).
All recipes that currently have PACKAGE_ARCH set to "${TUNE_PKGARCH}" are
affected. Not allarch ones, not "${MACHINE_ARCH}" ones and, most
importantly for saving time and space, not native ones, so using a
separate TMPDIR would be less than ideal.
> > 3. Something else?
> >
> > (I had a look through the docs but couldn't find any advice and the
> > multilib examples don't appear to do this.)
>
> FWIW multilib changes PACKAGE_ARCH.
I'm sorry but I realise now that I've been extremely unclear here. When I
said multilib, I didn't mean that side of multilib that has the
${MLPREFIX}, I meant the side that doesn't. You'd think that enabling
multilib wouldn't have an effect on the non-multilib recipes, but it does
for aarch64 because doing so changes (at least) libdir. :(
For the MACHINE that enables multilib, we have in our machine
configuration:
DEFAULTTUNE = "cortexa72-cortexa53-crypto"
require conf/machine/include/tune-cortexa72-cortexa53.inc
MULTILIBS = "multilib:lib32"
DEFAULTTUNE:virtclass-multilib-lib32 = "armv7at-neon"
tune-cortexa72-cortexa53.inc contains:
BASE_LIB:tune-cortexa72-cortexa53-crypto="lib64"
which means that multilib.conf sets libdir based on that to "/usr/lib64".
For the MACHINE that doesn't enable multilib, we have none of that so
bitbake.conf just uses BASELIB to set libdir to "/usr/lib".
I'm not sure if this change is intentional, but it does mean that merely
enabling multilib and not using it changes behaviour.
For our use, we'd have been much happier with libdir always being
"/usr/lib" but we unfortunately got baked into the default behaviour early
on.
[snip]
> > In my original message I wrote:
> > > > > My guess is that the problem here is that the stamps from the first
> > > > > machine build weren't removed during the "SECOND MACHINE BUILD 1" step
> > > > > above. If I remove them myself then the problem goes away. Is that
> > > > > theory correct? If so, then I can start trying to work out why and any
> > > > > advice would be welcome. If that theory is not correct then does anyone
> > > > > have any idea where I should start investigating?
> >
> > I was hoping that someone would just know the answer to my first question
> > above off the top of their head. If so then I'm willing to try digging into
> > this myself to see what I can discover. I just need to make sure that I'm
> > chasing the right part of the problem.
>
> I've avoided answering this as I think you're right but I'm not 100%
> sure and I don't want to send you off in the wrong direction. There are
> reasons it might not wipe out stamps and those reasons may be
> intentional but without looking at what is going on I couldn't be
> completely sure. So I think you're right and I'd chase that way but...
Understood. Thank you for your patience with me.
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-03 10:07 ` Mike Crowe
@ 2025-06-03 10:18 ` Richard Purdie
0 siblings, 0 replies; 14+ messages in thread
From: Richard Purdie @ 2025-06-03 10:18 UTC (permalink / raw)
To: Mike Crowe, bitbake-devel
On Tue, 2025-06-03 at 11:07 +0100, Mike Crowe wrote:
> On Monday 02 June 2025 at 22:04:39 +0100, Richard Purdie wrote:
> > On Mon, 2025-06-02 at 17:37 +0100, Mike Crowe wrote:
> > > On Monday 02 June 2025 at 15:17:44 +0100, Richard Purdie wrote:
> > > > On Mon, 2025-06-02 at 11:49 +0100, Mike Crowe via lists.openembedded.org wrote:
> > > > > On Monday 02 June 2025 at 12:15:57 +0200, Quentin Schulz wrote:
> > > > > > Hi Mike,
> > > > > >
> > > > > > On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > > > > > I seem to be running into a problem in Scarthgap that relates to outdated
> > > > > > > stamp files not being removed. I think this results in tasks not running
> > > > > > > when they should in later builds. I don't believe that this problem happens
> > > > > > > with Dunfell[1], though it does still happen with current master (both
> > > > > > > Bitbake and openembedded-core) too[2].
> > > > > > >
> > > > > > > In case my description below isn't clear, the files required are available
> > > > > > > in
> > > > > > > https://github.com/mikecrowe/openembedded-core/tree/stamps-not-removed-repro
> > > > > > > though
> > > > > > > https://github.com/mikecrowe/openembedded-core/commit/8fd8055ab3e1d2d429c6076481952bace251750c
> > > > > > > is probably the most interesting to look at.
> > > > > > >
> > > > > > > My reproduction requires a package to have two different recipes with
> > > > > > > different COMPATIBLE_MACHINEs and swapping between building for those two
> > > > > > > different MACHINEs. I'm using dummy recipes called lictest_one.bb and
> > > > > > > lictest_two.bb because I'm able to reproduce this problem with licence
> > > > > > > files, though I suspect that it is not limited to them. I have two machines
> > > > > > > named qemuarm64 and qemuarm64b.
> > > > > > >
> > > > > > > # PREPARATION
> > > > > > >
> > > > > > > I first start by running the cleansstate task for the recipe for both
> > > > > > > MACHINEs in order to ensure we're starting from a sensible state:
> > > > > > >
> > > > > > > MACHINE=qemuarm64 bitbake -c cleansstate lictest
> > > > > > > MACHINE=qemuarm64b bitbake -c cleansstate lictest
> > > > > > >
> > > > > > > # FIRST MACHINE BUILD 1
> > > > > > >
> > > > > > > I build for the first machine:
> > > > > > >
> > > > > > > MACHINE=qemuarm64 bitbake core-image-minimal
> > > > > > >
> > > > > > > At this point tmp-glibc/deploy/licenses/cortexa57/lictest contains the
> > > > > > > licence files and tmp-glibc/stamps/cortexa57-oe-linux/lictest/ contains
> > > > > >
> > > > > > Intuitively I would say that your recipes also need to have PACKAGE_ARCH =
> > > > > > "${MACHINE_ARCH}" set and then this wouldn't be an issue at all I believe
> > > > > > (because different directories for each recipe). I'm not entirely sure what
> > > > > > is the recommendation for when to start using PACKAGE_ARCH =
> > > > > > "${MACHINE_ARCH}", like what's the thing that should trigger this addition
> > > > > > in the recipe?
> > > > >
> > > > > Hi Quentin,
> > > > >
> > > > > Thanks. I had been holding that back as a somewhat-hacky solution to the
> > > > > problem. It would reduce the amount setscene tasks running when switching
> > > > > though.
> > > > >
> > > > > In reality the situation is more complex than the one I described above. We
> > > > > actually have multiple machines and two sets of recipes. Some machines use
> > > > > one set of recipes and some use the other set. Using PACKAGE_ARCH =
> > > > > "${MACHINE_ARCH}" might work, but it would result in more builds than
> > > > > necessary. I had considered changing setting PACKAGE_ARCH =
> > > > > "${TUNE_PKGARCH}-suffix" or PACKAGE_ARCH:append = "-suffix" or similar
> > > > > would work, but I haven't tested that.
> > > > >
> > > > > It's also not been clear to me where the divide between PACKAGE_ARCH =
> > > > > "${TUNE_PKGARCH}" and PACKAGE_ARCH = "${MACHINE_ARCH}" is. There are lots
> > > > > of ways (e.g. setting PACKAGECONFIG:pn-package) that could cause a
> > > > > PACKAGE_ARCH = "${TUNE_PKGARCH}"-specific recipe to vary between MACHINEs.
> > > > >
> > > > > > I'm also not saying that this isn't a bug or a bigger issue, even if my
> > > > > > suggestion may work around the issue.
> > > > >
> > > > > Thanks for the suggestion. If Dunfell had behaved as Scarthgap or master
> > > > > currently do then we probably would have just considered that to be the
> > > > > expected behaviour.
> > > >
> > > > I have a feeling there were some changes in this area to fix bugs but
> > > > I'm not remembering specifics. It sounds like some issue could have
> > > > crept in.
> > > >
> > > > What you're supposed to be able to do is define your own package
> > > > architectures so that you could have two here, one for each of your
> > > > configs. You could then switch between them and the appropriate
> > > > packages would be used.
> > >
> > > The trouble with this is it requires everything that depends on these
> > > packages to also use those package architectures otherwise the problem just
> > > moves up a level. Recipes changing for the same PACKAGE_ARCH is also not
> > > very discoverable in the general case (though in this one it is quite
> > > obvious).
> >
> > You've changed the configuration for them so it is only right they
> > should have a different PACKAGE_ARCH. This kind of thing does ripple
> > badly but that is the implication of a change like this.
>
> I agree. I just worry that if nothing breaks most of the time when we fail
> to set the different PACKAGE_ARCH on a reverse dependency then we'll not
> notice the mistake. Maybe that doesn't matter until it breaks though?
>
> The recipes involved in our case are gstreamer ones, which are depended
> upon by many other random recipes in other layers. We _can_ add a .bbappend
> for each of them that we use, but there's always a risk that we'll not
> notice when someone adds a new dependency. Maybe we could come up with some
> tooling to detect the situation?
We have tests in core for some of these kinds of scenarios, see the
sstate tests in meta/lib/oeqa/selftest/cases/sstatetests.py (run with
oe-selftest -r sstatetests -j X).
The basic idea is to run two "bitbake XXX -S none" commands on the
different configs and then analyse the stamps directories.
> > Let me explain this slightly differently. We once decided that it was
> > ok to change MACHINE and build in the same TMPDIR (multimachine
> > support). It was decided that changing DISTRO still meant a new TMPDIR
> > though. The implication of this is that if you change the
> > configuration/policy of a package (which is effectively distro policy),
> > it should be in a different TMPDIR.
> >
> > Over time, our error correction handling got better, to the point that
> > if you don't change TMPDIR but change DISTRO, it does mostly work. The
> > key bit is mostly. Ideally we'd catch all cases and we don't do badly
> > but it sounds like we're missing something here.
>
> At least in Dunfell, and probably for a while before that, I think that you
> did such a good job that my expectations had been that this worked.
We try to make it work but there are so many ways you could in theory
break it...
> > The way to officially handle what you describe is through PACKAGE_ARCH,
> > even if that is a bit painful (or separate TMPDIR).
> >
> > > > As it stands, you're relying on the sstate code to swap things out,
> > > > which is meant to be a fallback, not a default way of operating. Far
> > > > too many people rely on that doing the right thing now though.
> > >
> > > Our situation is actually even worse than I described above. One set of
> > > machines has multilib enabled and one doesn't, yet they share a
> > > TUNE_PKGARCH so the sstate shuffling happens for every package! Given your
> > > response it sounds like we should stop doing this.
>
> I shouldn't have said 'every package' there. I should have said 'every
> package where PACKAGE_ARCH = "${TUNE_PKGARCH}"'. In particular
> ${MACHINE_ARCH}, native and allarch packages don't need shuffling.
>
> > > How would you recommend defining our own package architecture in this case?
> > >
> > > 1. TUNE_PKGARCH:append = "-oursuffix"? (Which would also affect
> > > SSTATE_ARCHS_TUNEPKG.)
> >
> > You want sstate arch to be different so this is probably fine.
>
> OK. I did run into some problems when I tried it. I think that option 2
> will probably be better anyway.
>
> > > 2. Set the default PACKAGE_ARCH ?= "${TUNE_PKGARCH}-oursuffix" overriding
> > > the ??= default in bitbake.conf unless someone else overrides it?
> >
> > That would change all recipes and you should only change the recipes
> > affected and their dependencies.
> >
> > If it is all recipes, you want a separate TMPDIR and it is really a
> > different distro (or sub-distro).
Personally, I think you want a separate TMPDIR for ease.
> All recipes that currently have PACKAGE_ARCH set to "${TUNE_PKGARCH}" are
> affected. Not allarch ones, not "${MACHINE_ARCH}" ones and, most
> importantly for saving time and space, not native ones, so using a
> separate TMPDIR would be less than ideal.
Remember that if you share SSTATE_DIR between the two builds (and
probably a hashequiv server), it will reuse all the pieces that match
between the builds. This therefore isn't as expensive as you'd think,
it should reuse everything you mention.
> > > 3. Something else?
> > >
> > > (I had a look through the docs but couldn't find any advice and the
> > > multilib examples don't appear to do this.)
> >
> > FWIW multilib changes PACKAGE_ARCH.
>
> I'm sorry but I realise now that I've been extremely unclear here. When I
> said multilib, I didn't mean that side of multilib that has the
> ${MLPREFIX}, I meant the side that doesn't. You'd think that enabling
> multilib wouldn't have an effect on the non-multilib recipes, but it does
> for aarch64 because doing so changes (at least) libdir. :(
Right, it is assumed that if you want multilib, you always enable it in
your distro and use it in your different configs on at least a per arch
basis. We haven't tried to make the task signatures match with and
without mutlilib turned on as that would be hard and not that useful in
reality.
> For the MACHINE that enables multilib, we have in our machine
> configuration:
>
> DEFAULTTUNE = "cortexa72-cortexa53-crypto"
> require conf/machine/include/tune-cortexa72-cortexa53.inc
> MULTILIBS = "multilib:lib32"
> DEFAULTTUNE:virtclass-multilib-lib32 = "armv7at-neon"
>
> tune-cortexa72-cortexa53.inc contains:
>
> BASE_LIB:tune-cortexa72-cortexa53-crypto="lib64"
>
> which means that multilib.conf sets libdir based on that to "/usr/lib64".
>
> For the MACHINE that doesn't enable multilib, we have none of that so
> bitbake.conf just uses BASELIB to set libdir to "/usr/lib".
You could try changing that to match with /usr/lib64 but I don't know
if that would allow sstate object reuse or not. Probably not as it
isn't something we've chosen to invest time in.
> I'm not sure if this change is intentional, but it does mean that merely
> enabling multilib and not using it changes behaviour.
>
> For our use, we'd have been much happier with libdir always being
> "/usr/lib" but we unfortunately got baked into the default behaviour early
> on.
You could have chosen /usr/lib for the multlib. I get a lot of
complaints that we don't follow some of the other distro defaults which
is why multilib defaults to them as that was "new" at the time and we
could change them. OE has defaulted to /usr/lib since forever.
Cheers,
Richard
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-02 21:04 ` Richard Purdie
2025-06-03 10:07 ` Mike Crowe
@ 2025-06-08 19:20 ` Mike Crowe
2025-06-08 21:35 ` Richard Purdie
1 sibling, 1 reply; 14+ messages in thread
From: Mike Crowe @ 2025-06-08 19:20 UTC (permalink / raw)
To: Richard Purdie; +Cc: bitbake-devel
[
Snip explanation of building first for MACHINE1, then MACHINE2, then back
to MACHINE1 again with the same PACKAGE_ARCH and relying only on the task
hashes being different to ensure that the right files end up being used.
]
On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
>>>>> My guess is that the problem here is that the stamps from the first
>>>>> machine build weren't removed during the "SECOND MACHINE BUILD 1" step
>>>>> above. If I remove them myself then the problem goes away. Is that
>>>>> theory correct? If so, then I can start trying to work out why and any
>>>>> advice would be welcome. If that theory is not correct then does anyone
>>>>> have any idea where I should start investigating?
On Mon, 2025-06-02 at 17:37 +0100, Mike Crowe wrote:
>> I was hoping that someone would just know the answer to my first question
>> above off the top of their head. If so then I'm willing to try digging into
>> this myself to see what I can discover. I just need to make sure that I'm
>> chasing the right part of the problem.
On Monday 02 June 2025 at 22:04:39 +0100, Richard Purdie wrote:
> I've avoided answering this as I think you're right but I'm not 100%
> sure and I don't want to send you off in the wrong direction. There are
> reasons it might not wipe out stamps and those reasons may be
> intentional but without looking at what is going on I couldn't be
> completely sure. So I think you're right and I'd chase that way but...
It turns out that you were right to be hesitant.
Stamps are only removed if they were generated by the current MACHINE due
to oe-core#5634f2fb1740732056d2c1a22717184ef94405bf from 2018:
| sstate: Ensure a given machine only removes things which it created
|
| Currently if you build qemux86 and then generic86, the latter will
| remove all of the former from deploy and workdir. This is because
| qemux86 is i586, genericx86 is i686 and the architctures are compatible
| therefore the sstate 'cleaup' code kicks in.
|
| There was a valid reason for this to ensure i586 packages didn't get into
| an i686 rootfs for example. With the rootfs creation being filtered now, this
| is no longer necessary.
|
| Instead, save out a list of stamps which a give machine has ever seen in
| a given build and only clean up these things if they're no longer
| "reachable".
|
| In particular this means the autobuilder should no longer spend a load of time
| deleting files when switching MACHINE, improving build times.
I'm not sure if the situation the change describes can still occur.
genericx86 (which is presumably what was meant by generic86) no longer
seems to exist and TUNE_PKGARCH should be either i586 and i686, so the
stamps and work directories won't overlap with each other.
This commit was also in Dunfell though, so the next question is why doesn't
Dunfell suffer from the same problem? The answer surprised me: Dunfell
appears to have been looking for mismatched stamp file names (for me at
least).
In Dunfell, the build for MACHINE1 creates a stamp named:
1-r0.do_populate_lic.sigdata.14708a9f021d264f7c0a9c5c33a93417c6cb20ba936c5805ff96083a9cb05ee9
The first build for MACHINE2 creates a stamp named:
2-r0.do_populate_lic.sigdata.23cae22087dda59b5d9aad048411200dd4a3abb3308a241f9a89be65b7071d01
Yet the debug log for the second build for MACHINE1 contains:
DEBUG: Stampfile /.../build/tmp-glibc/stamps/aarch64-oe-linux/lictest/1-r0.do_populate_lic.14708a9f021d264f7c0a9c5c33a93417c6cb20ba936c5805ff96083a9cb05ee9 not available
It's looking for the stamp file without the ".sigdata" in the middle and
doesn't find one.
In Scarthgap, the stamp files are always created without the ".sigdata" in
the filename for me. This means that the stamp file _is_ found during the
second build for MACHINE1, which stops the do_populate_lic (or
do_populate_lic_setscene) task from running.
If I hack sstate.bbclass's sstate_eventhandler2's line:
if stamp not in stamps and stamp not in preservestamps and stamp in machineindex:
to remove the "and stamp in machineindex", to effectively make
oe-core#5634f2fb17 have no effect, then my problem goes away in Scarthgap:
the do_populate_lic or do_populate_lic_setscene tasks run as expected when
switching MACHINEs. This of course would reintroduce the original problem
that commit was trying to fix though.
The ".sigdata" part is not present in Dunfell stamp filenames for
do_populate_lic_setscene tasks. This means that the stamp check would match
and I would have expected the problem to occur on a third pair of builds
(the first pair don't find anything in sstate, the second ones do and run
do_populate_lic_setscene, the third pair should have the
do_populate_lic_setscene stamps left over from the second pair). It does
not. The do_populate_lic_setscene stamps are being removed by
sstate_installpkg calling sstate_clean where the pattern for removal is
tmp-glibc/stamps/aarch64-oe-linux/lictest/*-*.do_populate_lic*. As far as I
can see, this pattern matches the stamps from all builds that share the
same PACKAGE_ARCH. This step only runs for _setscene tasks, so it doesn't
happen for the real do_populate_lic task in the first build for MACHINE2.
I think that all that suggests three potential solutions:
1. Revert the optimisation in oe-core#5634f2fb17 because it isn't safe in
the general case.
2. Ensure that running a real task removes any other stamps for that task
with different hashes (perhaps by calling sstate_clean in the same way
that sstate_installpkg does?)
3. Create a tool that runs "bitbake -S none world" for each MACHINE and
then looks for stamp directories with multiple hashes for the same task.
Assuming the `name.rsplit(".", 3)` trick to extract the task name and
hash from the stamp filename for arbitrary PV in
meta/lib/oeqa/selftest/cases/sstatetests.py is correct this doesn't look
particularly difficult to do.
I shall continue to dig, but if anyone has any more ideas then they'd be
gratefully recieved.
Thanks.
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-08 19:20 ` Mike Crowe
@ 2025-06-08 21:35 ` Richard Purdie
2025-06-09 14:45 ` Mike Crowe
[not found] ` <1847671617FAC7D3.17668@lists.openembedded.org>
0 siblings, 2 replies; 14+ messages in thread
From: Richard Purdie @ 2025-06-08 21:35 UTC (permalink / raw)
To: Mike Crowe; +Cc: bitbake-devel
On Sun, 2025-06-08 at 20:20 +0100, Mike Crowe wrote:
> [
> Snip explanation of building first for MACHINE1, then MACHINE2, then back
> to MACHINE1 again with the same PACKAGE_ARCH and relying only on the task
> hashes being different to ensure that the right files end up being used.
> ]
>
> On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > > > > My guess is that the problem here is that the stamps from the first
> > > > > > machine build weren't removed during the "SECOND MACHINE BUILD 1" step
> > > > > > above. If I remove them myself then the problem goes away. Is that
> > > > > > theory correct? If so, then I can start trying to work out why and any
> > > > > > advice would be welcome. If that theory is not correct then does anyone
> > > > > > have any idea where I should start investigating?
>
> On Mon, 2025-06-02 at 17:37 +0100, Mike Crowe wrote:
> > > I was hoping that someone would just know the answer to my first question
> > > above off the top of their head. If so then I'm willing to try digging into
> > > this myself to see what I can discover. I just need to make sure that I'm
> > > chasing the right part of the problem.
>
> On Monday 02 June 2025 at 22:04:39 +0100, Richard Purdie wrote:
> > I've avoided answering this as I think you're right but I'm not 100%
> > sure and I don't want to send you off in the wrong direction. There are
> > reasons it might not wipe out stamps and those reasons may be
> > intentional but without looking at what is going on I couldn't be
> > completely sure. So I think you're right and I'd chase that way but...
>
> It turns out that you were right to be hesitant.
>
> Stamps are only removed if they were generated by the current MACHINE due
> to oe-core#5634f2fb1740732056d2c1a22717184ef94405bf from 2018:
>
> > sstate: Ensure a given machine only removes things which it created
> >
> > Currently if you build qemux86 and then generic86, the latter will
> > remove all of the former from deploy and workdir. This is because
> > qemux86 is i586, genericx86 is i686 and the architctures are compatible
> > therefore the sstate 'cleaup' code kicks in.
> >
> > There was a valid reason for this to ensure i586 packages didn't get into
> > an i686 rootfs for example. With the rootfs creation being filtered now, this
> > is no longer necessary.
> >
> > Instead, save out a list of stamps which a give machine has ever seen in
> > a given build and only clean up these things if they're no longer
> > "reachable".
> >
> > In particular this means the autobuilder should no longer spend a load of time
> > deleting files when switching MACHINE, improving build times.
Well, that makes sense and I think that change still makes sense. It
does raise the question of whether/how we could detect your scenario
and give better information to the user (and/or error?).
> I'm not sure if the situation the change describes can still occur.
> genericx86 (which is presumably what was meant by generic86) no longer
> seems to exist and TUNE_PKGARCH should be either i586 and i686, so the
> stamps and work directories won't overlap with each other.
genericx86 definitely still exists:
https://git.yoctoproject.org/poky/tree/meta-yocto-bsp/conf/machine/genericx86.conf
however it is now core2-32.
That doesn't mean the architecture issue can't exist but the default
tunes have changed. genericx86 and qemux86 still probably share
architecture compatibility and one would try and remove the other.
> This commit was also in Dunfell though, so the next question is why doesn't
> Dunfell suffer from the same problem? The answer surprised me: Dunfell
> appears to have been looking for mismatched stamp file names (for me at
> least).
>
> In Dunfell, the build for MACHINE1 creates a stamp named:
>
> 1-r0.do_populate_lic.sigdata.14708a9f021d264f7c0a9c5c33a93417c6cb20ba936c5805ff96083a9cb05ee9
"sigdata" files are information only files and not actual stamps. This
file contains stamp information but is not the actual stamp itself.
There are two possible versions of the stamp, either setscene or non-
setscene. On my local build, for linux-libc-headers I see:
./core2-64-poky-linux-musl/linux-libc-headers/6.12.do_populate_lic.73e166a4f76ae90f2785472dac9e88264c920aaf98ac2b43b71e0e4b9bec8469
./core2-64-poky-linux-musl/linux-libc-headers/6.12.do_populate_lic.sigdata.73e166a4f76ae90f2785472dac9e88264c920aaf98ac2b43b71e0e4b9bec8469
but I could also see:
./core2-64-poky-linux-musl/linux-libc-headers/6.12.do_populate_lic_setscene.73e166a4f76ae90f2785472dac9e88264c920aaf98ac2b43b71e0e4b9bec8469
The sigdata file provides information on the 73e166a4f7 stamp hash.
Stamps will have zero file size.
> The first build for MACHINE2 creates a stamp named:
>
> 2-r0.do_populate_lic.sigdata.23cae22087dda59b5d9aad048411200dd4a3abb3308a241f9a89be65b7071d01
>
> Yet the debug log for the second build for MACHINE1 contains:
>
> DEBUG: Stampfile /.../build/tmp-glibc/stamps/aarch64-oe-linux/lictest/1-r0.do_populate_lic.14708a9f021d264f7c0a9c5c33a93417c6cb20ba936c5805ff96083a9cb05ee9 not available
>
> It's looking for the stamp file without the ".sigdata" in the middle and
> doesn't find one.
It wouldn't ever look for something with sigdata in it. The code
doesn't really use sigdata files, they're there for debugging.
I suspect in dunfell these files are created somewhere else in stamps.
I have some vague recollection the do_populate_lic files were somehow
shared between arches and we dropped that idea but I don't really
remember...
Cheers,
Richard
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-08 21:35 ` Richard Purdie
@ 2025-06-09 14:45 ` Mike Crowe
[not found] ` <1847671617FAC7D3.17668@lists.openembedded.org>
1 sibling, 0 replies; 14+ messages in thread
From: Mike Crowe @ 2025-06-09 14:45 UTC (permalink / raw)
To: Richard Purdie; +Cc: bitbake-devel
On Sunday 08 June 2025 at 22:35:40 +0100, Richard Purdie wrote:
> On Sun, 2025-06-08 at 20:20 +0100, Mike Crowe wrote:
> > [
> > �Snip explanation of building first for MACHINE1, then MACHINE2, then back
> > �to MACHINE1 again with the same PACKAGE_ARCH and relying only on the task
> > �hashes being different to ensure that the right files end up being used.
> > ]
> >
> > On 5/30/25 7:17 PM, Mike Crowe via lists.openembedded.org wrote:
> > > > > > > My guess is that the problem here is that the stamps from the first
> > > > > > > machine build weren't removed during the "SECOND MACHINE BUILD 1" step
> > > > > > > above. If I remove them myself then the problem goes away. Is that
> > > > > > > theory correct? If so, then I can start trying to work out why and any
> > > > > > > advice would be welcome. If that theory is not correct then does anyone
> > > > > > > have any idea where I should start investigating?
> >
> > On Mon, 2025-06-02 at 17:37 +0100, Mike Crowe wrote:
> > > > I was hoping that someone would just know the answer to my first question
> > > > above off the top of their head. If so then I'm willing to try digging into
> > > > this myself to see what I can discover. I just need to make sure that I'm
> > > > chasing the right part of the problem.
> >
> > On Monday 02 June 2025 at 22:04:39 +0100, Richard Purdie wrote:
> > > I've avoided answering this as I think you're right but I'm not 100%
> > > sure and I don't want to send you off in the wrong direction. There are
> > > reasons it might not wipe out stamps and those reasons may be
> > > intentional but without looking at what is going on I couldn't be
> > > completely sure. So I think you're right and I'd chase that way but...
> >
> > It turns out that you were right to be hesitant.
> >
> > Stamps are only removed if they were generated by the current MACHINE due
> > to oe-core#5634f2fb1740732056d2c1a22717184ef94405bf from 2018:
> >
> > > sstate: Ensure a given machine only removes things which it created
> > >
> > > Currently if you build qemux86 and then generic86, the latter will
> > > remove all of the former from deploy and workdir. This is because
> > > qemux86 is i586, genericx86 is i686 and the architctures are compatible
> > > therefore the sstate 'cleaup' code kicks in.
> > >
> > > There was a valid reason for this to ensure i586 packages didn't get into
> > > an i686 rootfs for example. With the rootfs creation being filtered now, this
> > > is no longer necessary.
> > >
> > > Instead, save out a list of stamps which a give machine has ever seen in
> > > a given build and only clean up these things if they're no longer
> > > "reachable".
> > >
> > > In particular this means the autobuilder should no longer spend a load of time
> > > deleting files when switching MACHINE, improving build times.
>
> Well, that makes sense and I think that change still makes sense. It
> does raise the question of whether/how we could detect your scenario
> and give better information to the user (and/or error?).
>
> > I'm not sure if the situation the change describes can still occur.
> > genericx86 (which is presumably what was meant by generic86) no longer
> > seems to exist and TUNE_PKGARCH should be either i586 and i686, so the
> > stamps and work directories won't overlap with each other.
>
> genericx86 definitely still exists:
>
> https://git.yoctoproject.org/poky/tree/meta-yocto-bsp/conf/machine/genericx86.conf
>
> however it is now core2-32.
Ah, I was only looking in oe-core. :(
> That doesn't mean the architecture issue can't exist but the default
> tunes have changed. genericx86 and qemux86 still probably share
> architecture compatibility and one would try and remove the other.
AFAICS both set DEFAULTTUNE ?= "core2-32", so they both have the same
PACKAGE_ARCH = "core2-32". This means that they ought to be identical and
there should be no stamps that need to be removed.
I think that the only time that change has any effect is if a recipe has a
single PACKAGE_ARCH but has different hashes for different MACHINEs. That's
exactly the situation that we think is bad and ought to be reported rather
than supported. But, I must be wrong. Perhaps I'm wrong because stamps for
compatible SSTATE_ARCHs would be removed without this fix too?
This part probably doesn't matter though if we're aiming to just detect
this situation.
> > This commit was also in Dunfell though, so the next question is why doesn't
> > Dunfell suffer from the same problem? The answer surprised me: Dunfell
> > appears to have been looking for mismatched stamp file names (for me at
> > least).
> >
> > In Dunfell, the build for MACHINE1 creates a stamp named:
> >
> > �1-r0.do_populate_lic.sigdata.14708a9f021d264f7c0a9c5c33a93417c6cb20ba936c5805ff96083a9cb05ee9
>
> "sigdata" files are information only files and not actual stamps. This
> file contains stamp information but is not the actual stamp itself.
>
> There are two possible versions of the stamp, either setscene or non-
> setscene. On my local build, for linux-libc-headers I see:
>
> ./core2-64-poky-linux-musl/linux-libc-headers/6.12.do_populate_lic.73e166a4f76ae90f2785472dac9e88264c920aaf98ac2b43b71e0e4b9bec8469
> ./core2-64-poky-linux-musl/linux-libc-headers/6.12.do_populate_lic.sigdata.73e166a4f76ae90f2785472dac9e88264c920aaf98ac2b43b71e0e4b9bec8469
>
> but I could also see:
>
> ./core2-64-poky-linux-musl/linux-libc-headers/6.12.do_populate_lic_setscene.73e166a4f76ae90f2785472dac9e88264c920aaf98ac2b43b71e0e4b9bec8469
>
> The sigdata file provides information on the 73e166a4f7 stamp hash.
> Stamps will have zero file size.
Understood. I have no zero-sized do_populate_lic stamp files for the recipe
in my tmp-glibc/stamps directory on Dunfell. This could be a
consequence of us backporting the fix for
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14123 without some other
necessary prerequisite changes.
Anyway, I think I've got far enough to determine that the fact that this
problem didn't occur for us in Dunfell is a distraction so I won't pursue
that line of investigation any further and will try to work out how to
detect recipes that change task hashes between MACHINEs for the same
PACKAGE_ARCH instead.
It would be straightforward to look for other stamp files with different
hashes in the TUNE_PKGARCH directory, but that situation could arise
completely legitimately when recipes are being changed but haven't yet been
built for all MACHINEs. Attempting to process the stamp files after
building for all MACHINEs would only work from a clean TMPDIR for a similar
reason.
Instead I hacked together a Python script that reads the locked-sigs.inc
files, that are already conveniently arranged by PACKAGE_ARCH. It seems to
detect problematic recipes for me. At the very least it might help work out
whether this problem is common enough that more work would be worthwhile.
Thanks for all your advice.
Mike.
From 2662100c2d25ccc42ce9a6eafd46b36b0b87b176 Mon Sep 17 00:00:00 2001
From: Mike Crowe <mac@mcrowe.com>
Date: Mon, 9 Jun 2025 15:29:30 +0100
Subject: [PATCH] scripts/contrib: Add check-recipe-pkgarchs script
This script looks for recipes that modify their behaviour (usually,
though not necessarily for different MACHINEs) but don't use a unique
PACKAGE_ARCH when they do. This can lead to Bitbake thinking that tasks
don't need to be run when they should be.
---
scripts/contrib/check-recipe-pkgarchs | 85 +++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
create mode 100755 scripts/contrib/check-recipe-pkgarchs
diff --git a/scripts/contrib/check-recipe-pkgarchs b/scripts/contrib/check-recipe-pkgarchs
new file mode 100755
index 0000000000..c232a69739
--- /dev/null
+++ b/scripts/contrib/check-recipe-pkgarchs
@@ -0,0 +1,85 @@
+#!/usr/bin/env python3
+#
+# Read a number of locked-sigs.inc files passed on the command line
+# looking for tasks for a given PACKAGE_ARCH that have distinct
+# hashes. This indicates that the recipes probably ought to be using
+# PACKAGE_ARCH = "${MACHINE_ARCH}" or otherwise ensuring that
+# PACKAGE_ARCH changes when any of their task hashes change.
+#
+# The output format is:
+#
+# pkgarch:recipe:task
+# machines-with-hash-1...
+# machines-with-hash-2...
+# ...
+#
+# Example output:
+#
+# cortexa57:lictest:do_compile
+# qemuarm64
+# qemuarm64b qemuarm64c
+#
+# If each machine is on a line on its own then PACKAGE_ARCH =
+# "${MACHINE_ARCH}" is most likely to be the solution. If lines show
+# multiple machines then that would work, but it's possible that
+# another PACKAGE_ARCH might be more efficient.
+#
+# If no problems are found then there is no output and the script will
+# exit successfully. A non-zero exit status indicates that problems
+# were found.
+#
+# Usage:
+#
+# For each MACHINE run:
+# bitbake -S lockedsigs target && mv locked-sigs.inc locked-sigs.${MACHINE}
+#
+# then:
+# check-recipe-pkgarchs locked-sigs.*
+#
+import re, sys
+
+def parse_arch_package_tasks(file_path):
+ result = {}
+ with open(file_path, 'r') as file:
+ content = file.read()
+
+ pattern = re.compile(r'(SIGGEN_LOCKEDSIGS_t-[\w-]+)\s*=\s*"(.*?)"', re.DOTALL)
+ matches = pattern.findall(content)
+
+ for var_name, value in matches:
+ arch = var_name.split('SIGGEN_LOCKEDSIGS_t-')[-1]
+ lines = [line.strip().rstrip('\\\\') for line in value.strip().splitlines() if line.strip()]
+ for line in lines:
+ parts = line.split(':')
+ if len(parts) == 3:
+ package, task, hash = parts
+ key = f"{arch}:{package}:{task}"
+ result[key] = hash.strip()
+
+ return result
+
+sigmap = {}
+for file in sys.argv[1:]:
+ sigs = parse_arch_package_tasks(file)
+ build = file.removeprefix("locked-sigs.")
+ for task, hash in sigs.items():
+ if not task in sigmap:
+ # A new task
+ sigmap[task] = { hash : [build] }
+ elif hash in sigmap[task]:
+ # The same hash in a different build, may be good
+ sigmap[task][hash].append(build)
+ else:
+ # A different hash in a different file, bad
+ sigmap[task][hash] = [build]
+
+result=0
+for task, hashes in sigmap.items():
+ if len(hashes) > 1:
+ print(task)
+ for hash_val, files in hashes.items():
+ files = " ".join(files)
+ print(f" {files}")
+ result=1
+
+sys.exit(result)
--
2.39.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
[not found] ` <1847671617FAC7D3.17668@lists.openembedded.org>
@ 2025-06-12 14:33 ` Mike Crowe
2025-06-13 15:20 ` Richard Purdie
0 siblings, 1 reply; 14+ messages in thread
From: Mike Crowe @ 2025-06-12 14:33 UTC (permalink / raw)
To: Richard Purdie; +Cc: bitbake-devel
[
Snip explanation of problems with recipes that have the same PACKAGE_ARCH
but different task hashes. My understanding of Richard's view was that
although doing this may have worked in the past, it is to be avoided. I
wrote a script to detect tasks with the same PACKAGE_ARCH and differing
hashes.
]
On Monday 09 June 2025 at 15:45:21 +0100, Mike Crowe via lists.openembedded.org wrote:
> Instead I hacked together a Python script that reads the locked-sigs.inc
> files, that are already conveniently arranged by PACKAGE_ARCH. It seems to
> detect problematic recipes for me. At the very least it might help work out
> whether this problem is common enough that more work would be worthwhile.
Hi Richard,
Unfortunately this script revealed that there are various problems that are
difficult to address.
1. CC for allarch packages
We have slightly different values for CC on different MACHINEs. (32-bit
ARMs get -mfp16-format= and particularly-contrained MACHINEs get reduced
SECURITY_CFLAGS. All with the same DEFAULTTUNE are consistent though.) CC
is always exported by bitbake.conf (among several other places). This
taints all task hashes for all recipes, even allarch ones. Setting CC =
"no-cc-for-allarch" in allarch.bbclass works around this problem for
allarch packages.
2. multilib changes baselib for native and allarch packages
The task hashes for native packages change when multilib changes baselib
for the primary (i.e. non-MLPREFIX) architecture. This ends up tainting the
hashes of some allarch packages which depend on -native packages too, such
as ca-certificates.
This can be solved if we can avoid baselib being different. I'm hoping that
we can do this which will make this a non-problem for us. It may be more of
a problem for others.
3. RECIPE_SYSROOT changes its dependencies when using multilib
Between multilib-using and non-multilib-using machines RECIPE_SYSROOT
changes which variables it depends on, even if its expanded version doesn't
change value. This causes the hashes to change:
basehash changed from 7e0c5fd84cfb6c12da76d35af73b20ad139213fe8d3d66c9efb934ad5da2c73a to f3644c0dfa625fc1b60782e4e32afb622abf0620bd3235d07a543b3bb4de4d65
List of dependencies for variable RECIPE_SYSROOT changed from 'frozenset()' to 'frozenset({'MLPREFIX'})'
Dependency on variable MLPREFIX was added
Variable RECIPE_SYSROOT value changed from '${WORKDIR}/recipe-sysroot' to '${WORKDIR}/${MLPREFIX}recipe-sysroot'
The only way I can think of to avoid this is to always set RECIPE_SYSROOT
to "${WORKDIR}/${MLPREFIX}recipe-sysroot" in bitbake.conf (which means
setting RECIPE_SYSROOT:class-native too), even when not using multilib.
4. gcc-source is sensitive to TARGET_CC_ARCH, SECURITY_CFLAGS and
SECURITY_LDFLAGS.
gcc-source.inc already has a list of variables that it has to sanitise. It
can sanitise these too.
5. cross packages are PACKAGE_ARCH = "${BUILD_ARCH}", but they depend on
non-cross packages with the default PACKAGE_ARCH.
This means that changing PACKAGE_ARCH for MACHINEs that have differences in
(e.g.) multilib or SECURITY_CFLAGS isn't sufficient to ensure that they are
kept separate. That different value of PACKAGE_ARCH in (e.g.)
linux-libc-headers taints the hashes for (e.g.) gcc-cross-aarch64.
6. :append breaks the variable sanitising done by native.bbclass,
allarch.bbclass, gcc-source.inc etc.
The :append happens after the variable has been sanitised. The only way I
can see around this is to invent new "EXTRA_" variables included in the
default value.
Assuming we can't fix all of these we're left unable to rely on a tool to
identify more serious contraventions. I think this leaves us with four
options:
A. Fix all the serious contraventions of recipes changing behaviour with
the same PACKAGE_ARCH we know about and hope that if any new ones appear
we'll be able to work out what's going wrong and fix them. This isn't
great because it will probably be members of the team who have less
understanding of this stuff that encounter the problems. This assumes
that any problems cause actual build failures rather than silent
misbehaviour.
B. Find some way to fix all the above problems, and any new ones that are
then revealed. Keep using a tool on the autobuilder to make sure that no
new contraventions appear. Encourage use of the tool in other layers
too.
C. Find a way to fix the Bitbake problem that I originally identified at
the start of this thread. I think that I was on the right track by
suggesting that we need to do what sstate_clean does to clean up old
stamps even when running a non-setscene task, but I suspect that this
got muddled up with my confusion over stamp filenames.
D. Use a distinct TMPDIR for each group of "similar" MACHINEs and rely on
sstate to avoid needing to rebuild packages that truly are identical.
I'm not keen on this option because it will require reworking a lot of
the tooling we have around Bitbake. I also think it would preclude the
hope that I have that one day we'd be able to use multiconfig to build
all of our MACHINEs at once in parallel.
In the short term I think that I'm going to have a go at option C, but with
the fallback that I can just hack around oe-core#5634f2fb17 to fix the
problem for us in the short term.
Thanks.
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-12 14:33 ` Mike Crowe
@ 2025-06-13 15:20 ` Richard Purdie
2025-06-17 13:47 ` Mike Crowe
0 siblings, 1 reply; 14+ messages in thread
From: Richard Purdie @ 2025-06-13 15:20 UTC (permalink / raw)
To: mac; +Cc: bitbake-devel
On Thu, 2025-06-12 at 15:33 +0100, Mike Crowe via lists.openembedded.org wrote:
> [
> Snip explanation of problems with recipes that have the same PACKAGE_ARCH
> but different task hashes. My understanding of Richard's view was that
> although doing this may have worked in the past, it is to be avoided. I
> wrote a script to detect tasks with the same PACKAGE_ARCH and differing
> hashes.
> ]
>
> On Monday 09 June 2025 at 15:45:21 +0100, Mike Crowe via lists.openembedded.org wrote:
> > Instead I hacked together a Python script that reads the locked-sigs.inc
> > files, that are already conveniently arranged by PACKAGE_ARCH. It seems to
> > detect problematic recipes for me. At the very least it might help work out
> > whether this problem is common enough that more work would be worthwhile.
>
> Hi Richard,
>
> Unfortunately this script revealed that there are various problems that are
> difficult to address.
>
> 1. CC for allarch packages
>
> We have slightly different values for CC on different MACHINEs. (32-bit
> ARMs get -mfp16-format= and particularly-contrained MACHINEs get reduced
> SECURITY_CFLAGS. All with the same DEFAULTTUNE are consistent though.) CC
> is always exported by bitbake.conf (among several other places). This
> taints all task hashes for all recipes, even allarch ones. Setting CC =
> "no-cc-for-allarch" in allarch.bbclass works around this problem for
> allarch packages.
Where are we adding flags to CC rather than CFLAGS? That sounds like
the real bug somewhere...
> 2. multilib changes baselib for native and allarch packages
>
> The task hashes for native packages change when multilib changes baselib
> for the primary (i.e. non-MLPREFIX) architecture. This ends up tainting the
> hashes of some allarch packages which depend on -native packages too, such
> as ca-certificates.
>
> This can be solved if we can avoid baselib being different. I'm hoping that
> we can do this which will make this a non-problem for us. It may be more of
> a problem for others.
>
> 3. RECIPE_SYSROOT changes its dependencies when using multilib
>
> Between multilib-using and non-multilib-using machines RECIPE_SYSROOT
> changes which variables it depends on, even if its expanded version doesn't
> change value. This causes the hashes to change:
>
> basehash changed from 7e0c5fd84cfb6c12da76d35af73b20ad139213fe8d3d66c9efb934ad5da2c73a to f3644c0dfa625fc1b60782e4e32afb622abf0620bd3235d07a543b3bb4de4d65
> List of dependencies for variable RECIPE_SYSROOT changed from 'frozenset()' to 'frozenset({'MLPREFIX'})'
> Dependency on variable MLPREFIX was added
> Variable RECIPE_SYSROOT value changed from '${WORKDIR}/recipe-sysroot' to '${WORKDIR}/${MLPREFIX}recipe-sysroot'
>
> The only way I can think of to avoid this is to always set RECIPE_SYSROOT
> to "${WORKDIR}/${MLPREFIX}recipe-sysroot" in bitbake.conf (which means
> setting RECIPE_SYSROOT:class-native too), even when not using multilib.
One trick that may work for this is setting:
RECIPE_SYSROOT[vardepvalue] = "${RECIPE_SYSROOT}"
>
> 4. gcc-source is sensitive to TARGET_CC_ARCH, SECURITY_CFLAGS and
> SECURITY_LDFLAGS.
>
> gcc-source.inc already has a list of variables that it has to sanitise. It
> can sanitise these too.
>
> 5. cross packages are PACKAGE_ARCH = "${BUILD_ARCH}", but they depend on
> non-cross packages with the default PACKAGE_ARCH.
>
> This means that changing PACKAGE_ARCH for MACHINEs that have differences in
> (e.g.) multilib or SECURITY_CFLAGS isn't sufficient to ensure that they are
> kept separate. That different value of PACKAGE_ARCH in (e.g.)
> linux-libc-headers taints the hashes for (e.g.) gcc-cross-aarch64.
I suspect that is meant to be getting excluded from the hashes...
> 6. :append breaks the variable sanitising done by native.bbclass,
> allarch.bbclass, gcc-source.inc etc.
>
> The :append happens after the variable has been sanitised. The only way I
> can see around this is to invent new "EXTRA_" variables included in the
> default value.
Which variables are we appending and can we stop doing that? I keep
telling people we need to avoid :append and friends...
> Assuming we can't fix all of these we're left unable to rely on a tool to
> identify more serious contraventions. I think this leaves us with four
> options:
>
> A. Fix all the serious contraventions of recipes changing behaviour with
> the same PACKAGE_ARCH we know about and hope that if any new ones appear
> we'll be able to work out what's going wrong and fix them. This isn't
> great because it will probably be members of the team who have less
> understanding of this stuff that encounter the problems. This assumes
> that any problems cause actual build failures rather than silent
> misbehaviour.
>
> B. Find some way to fix all the above problems, and any new ones that are
> then revealed. Keep using a tool on the autobuilder to make sure that no
> new contraventions appear. Encourage use of the tool in other layers
> too.
>
> C. Find a way to fix the Bitbake problem that I originally identified at
> the start of this thread. I think that I was on the right track by
> suggesting that we need to do what sstate_clean does to clean up old
> stamps even when running a non-setscene task, but I suspect that this
> got muddled up with my confusion over stamp filenames.
>
> D. Use a distinct TMPDIR for each group of "similar" MACHINEs and rely on
> sstate to avoid needing to rebuild packages that truly are identical.
> I'm not keen on this option because it will require reworking a lot of
> the tooling we have around Bitbake. I also think it would preclude the
> hope that I have that one day we'd be able to use multiconfig to build
> all of our MACHINEs at once in parallel.
>
>
> In the short term I think that I'm going to have a go at option C, but with
> the fallback that I can just hack around oe-core#5634f2fb17 to fix the
> problem for us in the short term.
I've done my best to give some ideas/thoughts above as best I can...
Cheers,
Richard
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bitbake-devel] Should stamp files for different versions of a recipe exist at the same time?
2025-06-13 15:20 ` Richard Purdie
@ 2025-06-17 13:47 ` Mike Crowe
0 siblings, 0 replies; 14+ messages in thread
From: Mike Crowe @ 2025-06-17 13:47 UTC (permalink / raw)
To: Richard Purdie; +Cc: bitbake-devel
On Friday 13 June 2025 at 16:20:46 +0100, Richard Purdie wrote:
> On Thu, 2025-06-12 at 15:33 +0100, Mike Crowe via lists.openembedded.org wrote:
> > [
> > �Snip explanation of problems with recipes that have the same PACKAGE_ARCH
> > �but different task hashes. My understanding of Richard's view was that
> > �although doing this may have worked in the past, it is to be avoided. I
> > �wrote a script to detect tasks with the same PACKAGE_ARCH and differing
> > �hashes.
> > ]
> >
> > On Monday 09 June 2025 at 15:45:21 +0100, Mike Crowe via lists.openembedded.org wrote:
> > > Instead I hacked together a Python script that reads the locked-sigs.inc
> > > files, that are already conveniently arranged by PACKAGE_ARCH. It seems to
> > > detect problematic recipes for me. At the very least it might help work out
> > > whether this problem is common enough that more work would be worthwhile.
> >
> > Hi Richard,
> >
> > Unfortunately this script revealed that there are various problems that are
> > difficult to address.
> >
> > 1. CC for allarch packages
> >
> > We have slightly different values for CC on different MACHINEs. (32-bit
> > ARMs get -mfp16-format= and particularly-contrained MACHINEs get reduced
> > SECURITY_CFLAGS. All with the same DEFAULTTUNE are consistent though.) CC
> > is always exported by bitbake.conf (among several other places). This
> > taints all task hashes for all recipes, even allarch ones. Setting CC =
> > "no-cc-for-allarch" in allarch.bbclass works around this problem for
> > allarch packages.
>
> Where are we adding flags to CC rather than CFLAGS? That sounds like
> the real bug somewhere...
We were doing that ourselves. I can switch to appending to TARGET_CC_ARCH
instead, but that is problematic too as described below.
> > 3. RECIPE_SYSROOT changes its dependencies when using multilib
> >
> > Between multilib-using and non-multilib-using machines RECIPE_SYSROOT
> > changes which variables it depends on, even if its expanded version doesn't
> > change value. This causes the hashes to change:
> >
> > �basehash changed from 7e0c5fd84cfb6c12da76d35af73b20ad139213fe8d3d66c9efb934ad5da2c73a to f3644c0dfa625fc1b60782e4e32afb622abf0620bd3235d07a543b3bb4de4d65
> > �List of dependencies for variable RECIPE_SYSROOT changed from 'frozenset()' to 'frozenset({'MLPREFIX'})'
> > �Dependency on variable MLPREFIX was added
> > �Variable RECIPE_SYSROOT value changed from '${WORKDIR}/recipe-sysroot' to '${WORKDIR}/${MLPREFIX}recipe-sysroot'
> >
> > The only way I can think of to avoid this is to always set RECIPE_SYSROOT
> > to "${WORKDIR}/${MLPREFIX}recipe-sysroot" in bitbake.conf (which means
> > setting RECIPE_SYSROOT:class-native too), even when not using multilib.
>
> One trick that may work for this is setting:
>
> RECIPE_SYSROOT[vardepvalue] = "${RECIPE_SYSROOT}"
Thanks. That would be neater if it works. I'll try it.
> > 5. cross packages are PACKAGE_ARCH = "${BUILD_ARCH}", but they depend on
> > �� non-cross packages with the default PACKAGE_ARCH.
> >
> > This means that changing PACKAGE_ARCH for MACHINEs that have differences in
> > (e.g.) multilib or SECURITY_CFLAGS isn't sufficient to ensure that they are
> > kept separate. That different value of PACKAGE_ARCH in (e.g.)
> > linux-libc-headers taints the hashes for (e.g.) gcc-cross-aarch64.
>
> I suspect that is meant to be getting excluded from the hashes...
Perhaps. I don't think I understand enough to prove to myself that doing
this is correct. Maybe once I've solved all the other problems I can return
to this one.
> > 6. :append breaks the variable sanitising done by native.bbclass,
> > �� allarch.bbclass, gcc-source.inc etc.
> >
> > The :append happens after the variable has been sanitised. The only way I
> > can see around this is to invent new "EXTRA_" variables included in the
> > default value.
>
> Which variables are we appending and can we stop doing that? I keep
> telling people we need to avoid :append and friends...
We're doing it in a few places ourselves, such as:
TARGET_CC_ARCH:append:armv7ve = " -mfp16-format=ieee"
I can work around this one by putting it only in the machine file so I can
use += rather than :append.
But oe-core also does this in meta/conf/distro/include/time64.inc:
TARGET_CC_ARCH:append:x86 = "${@bb.utils.contains('TUNE_FEATURES', 'm32', '${GLIBC_64BIT_TIME_FLAGS}', '', d)}"
for example and I can't work around that one from outside.
Whilst addressing some of the above problems I ran into more problems with
gcc-source.inc's sanitising of variables being ineffective when those
variables are assigned using overrides. I was able to work around this by
making gcc-source.inc use :forcevariable for its assignments but I'm
worried that this just risks causing more problems than it solves. :(
[snip]
> I've done my best to give some ideas/thoughts above as best I can...
I'm very grateful for the time and effort you've put into providing advice.
It's become clear to me that we've ended up doing things in a rather
different way to others which means that we're bound to run into our own
unique problems.
Thanks.
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-06-17 13:47 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 17:17 Should stamp files for different versions of a recipe exist at the same time? Mike Crowe
2025-06-02 10:15 ` [bitbake-devel] " Quentin Schulz
2025-06-02 10:49 ` Mike Crowe
2025-06-02 14:17 ` Richard Purdie
2025-06-02 16:37 ` Mike Crowe
2025-06-02 21:04 ` Richard Purdie
2025-06-03 10:07 ` Mike Crowe
2025-06-03 10:18 ` Richard Purdie
2025-06-08 19:20 ` Mike Crowe
2025-06-08 21:35 ` Richard Purdie
2025-06-09 14:45 ` Mike Crowe
[not found] ` <1847671617FAC7D3.17668@lists.openembedded.org>
2025-06-12 14:33 ` Mike Crowe
2025-06-13 15:20 ` Richard Purdie
2025-06-17 13:47 ` Mike Crowe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.