Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror

Openembedded Core Discussions
 help / color / mirror / Atom feed

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-05  0:44 Max Yu
@ 2024-10-07 11:57 ` Alexander Kanavin
  2024-10-07 20:09   ` Yu, Max
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Kanavin @ 2024-10-07 11:57 UTC (permalink / raw)
  To: yumx; +Cc: openembedded-core, Przemyslaw Sobon

I don't think I can agree with this. This means stability regressions
in one's network infrastructure will go unnoticed and unfixed, and
goes contrary to some people wishing to know immediately when
'corrupted sstate' occurs, and fix it there and then. Similar patches
have been rejected in the past.

Please fix your infra.

Alex

On Sat, 5 Oct 2024 at 02:44, Yu, Max via lists.openembedded.org
<yumx=amazon.com@lists.openembedded.org> wrote:
>
> We observe sstate cache corruptions sometimes which cause rebuilds. That is not
> a fatal error as the package has to be rebuilt and updated artifact needs to be
> pushed to remote sstate cache mirror. Currently, Yocto does not handle
> corruptions properly, where the corrupted artifact is not deleted or
> renamed. Later, after the package is built the same corrupted artifact is pushed
> to remote mirror and the same procedure is circled again and again.
>
> This change verifies the outcome of the unpacking action and renames the
> artifact if a fatal error occurred ("tar" tool returns error 2). In such case we
> rename the artifact what causes that a proper one is created and uploaded
> overwriting the exisiting one - the corrupted one - in the remote mirror. That
> way we break the loop of uploading corrupted file again and again.
>
> Suggested-by: Przemyslaw Sobon <psobon@amazon.com>
> Signed-off-by: Max Yu <yumx@amazon.com>
> ---
>  meta/classes-global/sstate.bbclass | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/meta/classes-global/sstate.bbclass b/meta/classes-global/sstate.bbclass
> index 11bb892a42..5a7ce35341 100644
> --- a/meta/classes-global/sstate.bbclass
> +++ b/meta/classes-global/sstate.bbclass
> @@ -937,7 +937,12 @@ sstate_unpack_package () {
>                 ZSTD="pzstd -p ${ZSTD_THREADS}"
>         fi
>
> -       tar -I "$ZSTD" -xvpf ${SSTATE_PKG}
> +       if ! tar -I "$ZSTD" -xvpf ${SSTATE_PKG}; then
> +               echo "Fatal error extracting sstate cache artifacts, file might be corrupted or truncated, renaming"
> +               mv ${SSTATE_PKG} ${SSTATE_PKG}.unpack_error
> +               exit 2
> +       fi
> +
>         # update .siginfo atime on local/NFS mirror if it is a symbolic link
>         [ ! -h ${SSTATE_PKG}.siginfo ] || [ ! -e ${SSTATE_PKG}.siginfo ] || touch -a ${SSTATE_PKG}.siginfo 2>/dev/null || true
>         # update each symbolic link instead of any referenced file
> --
> 2.40.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#205237): https://lists.openembedded.org/g/openembedded-core/message/205237
> Mute This Topic: https://lists.openembedded.org/mt/108828269/1686489
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-07 11:57 ` [OE-core] " Alexander Kanavin
@ 2024-10-07 20:09   ` Yu, Max
  2024-10-07 20:38     ` Alexander Kanavin
  2024-10-08  9:37     ` Alexander Kanavin
  0 siblings, 2 replies; 12+ messages in thread
From: Yu, Max @ 2024-10-07 20:09 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: openembedded-core@lists.openembedded.org, Sobon, Przemyslaw

Hi Alex,

Corruptions are unavoidable part of our life, disks and network can inject failures due to unpredictable and unknown reasons like https://www.sciencedirect.com/science/article/abs/pii/S0026271421003723. Even multi-layer protection is not perfect as it depends on where the error is injected.

We proposed this patch as it is aligned with how we currently handle incorrect checksums: https://github.com/yoctoproject/poky/commit/672c07de4a96eb67eaafba0873eced44ec9ae1a6.

For context, we have builds running at a large scale, almost 24/7. This scale contributes to the following challenges:
1. sstate corruption for us happens ~1/10000 of builds. These are extremely hard to reproduce and debug...
2. with current behavior of reuploading the corrupted sstate object, we end up in an endless loop of rebuilding that we cannot break.
We considered this yocto behavior as a bug, specifically because of #2.

I hear your point, even though following your reasoning we should fail the entire build rather than rebuilding the packages. We can explore other options like parameterizing the behavior, but it will be very useful to be able to break the bad loop of rebuilding somehow.

Thanks,
Max

On 10/7/24, 4:58 AM, "Alexander Kanavin" <alex.kanavin@gmail.com <mailto:alex.kanavin@gmail.com>> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

I don't think I can agree with this. This means stability regressions
in one's network infrastructure will go unnoticed and unfixed, and
goes contrary to some people wishing to know immediately when
'corrupted sstate' occurs, and fix it there and then. Similar patches
have been rejected in the past.

Please fix your infra.

Alex

On Sat, 5 Oct 2024 at 02:44, Yu, Max via lists.openembedded.org
<yumx=amazon.com@lists.openembedded.org <mailto:amazon.com@lists.openembedded.org>> wrote:
>
> We observe sstate cache corruptions sometimes which cause rebuilds. That is not
> a fatal error as the package has to be rebuilt and updated artifact needs to be
> pushed to remote sstate cache mirror. Currently, Yocto does not handle
> corruptions properly, where the corrupted artifact is not deleted or
> renamed. Later, after the package is built the same corrupted artifact is pushed
> to remote mirror and the same procedure is circled again and again.
>
> This change verifies the outcome of the unpacking action and renames the
> artifact if a fatal error occurred ("tar" tool returns error 2). In such case we
> rename the artifact what causes that a proper one is created and uploaded
> overwriting the exisiting one - the corrupted one - in the remote mirror. That
> way we break the loop of uploading corrupted file again and again.
>
> Suggested-by: Przemyslaw Sobon <psobon@amazon.com <mailto:psobon@amazon.com>>
> Signed-off-by: Max Yu <yumx@amazon.com <mailto:yumx@amazon.com>>
> ---
> meta/classes-global/sstate.bbclass | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/meta/classes-global/sstate.bbclass b/meta/classes-global/sstate.bbclass
> index 11bb892a42..5a7ce35341 100644
> --- a/meta/classes-global/sstate.bbclass
> +++ b/meta/classes-global/sstate.bbclass
> @@ -937,7 +937,12 @@ sstate_unpack_package () {
> ZSTD="pzstd -p ${ZSTD_THREADS}"
> fi
>
> - tar -I "$ZSTD" -xvpf ${SSTATE_PKG}
> + if ! tar -I "$ZSTD" -xvpf ${SSTATE_PKG}; then
> + echo "Fatal error extracting sstate cache artifacts, file might be corrupted or truncated, renaming"
> + mv ${SSTATE_PKG} ${SSTATE_PKG}.unpack_error
> + exit 2
> + fi
> +
> # update .siginfo atime on local/NFS mirror if it is a symbolic link
> [ ! -h ${SSTATE_PKG}.siginfo ] || [ ! -e ${SSTATE_PKG}.siginfo ] || touch -a ${SSTATE_PKG}.siginfo 2>/dev/null || true
> # update each symbolic link instead of any referenced file
> --
> 2.40.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#205237): https://lists.openembedded.org/g/openembedded-core/message/205237 <https://lists.openembedded.org/g/openembedded-core/message/205237>
> Mute This Topic: https://lists.openembedded.org/mt/108828269/1686489 <https://lists.openembedded.org/mt/108828269/1686489>
> Group Owner: openembedded-core+owner@lists.openembedded.org <mailto:openembedded-core+owner@lists.openembedded.org>
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub <https://lists.openembedded.org/g/openembedded-core/unsub> [alex.kanavin@gmail.com <mailto:alex.kanavin@gmail.com>]
> -=-=-=-=-=-=-=-=-=-=-=-
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-07 20:09   ` Yu, Max
@ 2024-10-07 20:38     ` Alexander Kanavin
  2024-10-08  9:37     ` Alexander Kanavin
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Kanavin @ 2024-10-07 20:38 UTC (permalink / raw)
  To: Yu, Max; +Cc: openembedded-core@lists.openembedded.org, Sobon, Przemyslaw

On Mon, 7 Oct 2024 at 22:09, Yu, Max <yumx@amazon.com> wrote:
> Corruptions are unavoidable part of our life, disks and network can inject failures due to unpredictable and unknown reasons like https://www.sciencedirect.com/science/article/abs/pii/S0026271421003723. Even multi-layer protection is not perfect as it depends on where the error is injected.
>
> We proposed this patch as it is aligned with how we currently handle incorrect checksums: https://github.com/yoctoproject/poky/commit/672c07de4a96eb67eaafba0873eced44ec9ae1a6.
>
> For context, we have builds running at a large scale, almost 24/7. This scale contributes to the following challenges:
> 1. sstate corruption for us happens ~1/10000 of builds. These are extremely hard to reproduce and debug...
> 2. with current behavior of reuploading the corrupted sstate object, we end up in an endless loop of rebuilding that we cannot break.
> We considered this yocto behavior as a bug, specifically because of #2.
>
> I hear your point, even though following your reasoning we should fail the entire build rather than rebuilding the packages. We can explore other options like parameterizing the behavior, but it will be very useful to be able to break the bad loop of rebuilding somehow.

Perhaps you can describe how the issue can be reproduced and observed
in a local build? (e.g. build something, corrupt sstate for it,
observe the endless rebuild problem)
I'd like to understand particularly where in existing code the endless
cycle would be triggered.

Alex


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-07 20:09   ` Yu, Max
  2024-10-07 20:38     ` Alexander Kanavin
@ 2024-10-08  9:37     ` Alexander Kanavin
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Kanavin @ 2024-10-08  9:37 UTC (permalink / raw)
  To: Yu, Max; +Cc: openembedded-core@lists.openembedded.org, Sobon, Przemyslaw

On Mon, 7 Oct 2024 at 22:09, Yu, Max <yumx@amazon.com> wrote:
> Corruptions are unavoidable part of our life, disks and network can inject failures due to unpredictable and unknown reasons like https://www.sciencedirect.com/science/article/abs/pii/S0026271421003723. Even multi-layer protection is not perfect as it depends on where the error is injected.

I'm sorry but I am not convinced. You are the only one reporting the
issue, so I don't think it's unavoidable.

> We proposed this patch as it is aligned with how we currently handle incorrect checksums: https://github.com/yoctoproject/poky/commit/672c07de4a96eb67eaafba0873eced44ec9ae1a6.

What the patch is doing is essentially -c cleansstate on a cache used
by several consumers. This makes it a non-starter: you can't remove
items from live sstate like that. Once an object is in the cache, it
needs to be removed offline, or replaced atomically if it's corrupted.

Before we decide how to handle corrupted items, it's perhaps better to
consider first how they should be reported. Then users can decide what
they want to do with that information without yocto forcing something
on them. So please: how can I observe the issue?

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
@ 2024-10-08 18:47 Sobon, Przemyslaw
  2024-10-10 12:18 ` Alexander Kanavin
  0 siblings, 1 reply; 12+ messages in thread
From: Sobon, Przemyslaw @ 2024-10-08 18:47 UTC (permalink / raw)
  To: Alexander Kanavin, Yu, Max; +Cc: openembedded-core@lists.openembedded.org

On Tuesday, October 8, 2024 2:38 AM Alexander Kanavin <alex.kanavin@gmail.com> wrote:
>On Mon, 7 Oct 2024 at 22:09, Yu, Max <yumx@amazon.com> wrote:
>> Corruptions are unavoidable part of our life, disks and network can inject
>> failures due to unpredictable and unknown reasons like
>> https://www.sciencedirect.com/science/article/abs/pii/S0026271421003723.
>> Even multi-layer protection is not perfect as it depends on where the error
>> is injected.
> I'm sorry but I am not convinced. You are the only one reporting the issue,
> so I don't think it's unavoidable.
The "only one reporting" does not mean the problem does not exist, it may be
just not as big problem for others as for us. Similar problem exist for DRAM
memory corruptions? Most of the people don't care about that but for some
this is important problem, e.g. when you see 1 OS crash per 10 years it is
not a big deal but if you own 10k servers you see 3 crashes per day. That is
the scale factor that is important. Max talked about our scale already.
Summarizing, the manual work is not a solution for us due to scale.

>> We proposed this patch as it is aligned with how we currently handle
>> incorrect checksums:
>> https://github.com/yoctoproject/poky/commit/672c07de4a96eb67eaafba0873eced44ec9ae1a6.
> What the patch is doing is essentially -c cleansstate on a cache used by
> several consumers. This makes it a non-starter: you can't remove items from
> live sstate like that. Once an object is in the cache, it needs to be removed
> offline, or replaced atomically if it's corrupted.
I disagree, we can overwrite bad artifact. Yocto indirectly does that as it has
to rebuild the package. This is "by design" behavior. And to be honest, there is
no difference between (1) rebuilding the package every time and (2) overwriting
sstate cache so any other build can reuse it. Is there any concern around
uploading such freshly built artifact?

> Before we decide how to handle corrupted items, it's perhaps better to
> consider first how they should be reported. Then users can decide what they
> want to do with that information without yocto forcing something on them.
> So please: how can I observe the issue?
This is random thing, we are not in control of e.g. DRAM bit flip error, they
simply happen. To simulate the situation you can inject an error yourself by
e.g., overwriting the random byte of the zstd file before it is uploaded.

>Alex

Regards,
Przemyslaw

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-08 18:47 [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror Sobon, Przemyslaw
@ 2024-10-10 12:18 ` Alexander Kanavin
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Kanavin @ 2024-10-10 12:18 UTC (permalink / raw)
  To: Sobon, Przemyslaw; +Cc: Yu, Max, openembedded-core@lists.openembedded.org

On Tue, 8 Oct 2024 at 20:47, Sobon, Przemyslaw <psobon@amazon.com> wrote:
> The "only one reporting" does not mean the problem does not exist, it may be
> just not as big problem for others as for us. Similar problem exist for DRAM
> memory corruptions? Most of the people don't care about that but for some
> this is important problem, e.g. when you see 1 OS crash per 10 years it is
> not a big deal but if you own 10k servers you see 3 crashes per day. That is
> the scale factor that is important. Max talked about our scale already.
> Summarizing, the manual work is not a solution for us due to scale.

But can you say what your scale actually is? How many sstate objects
are written into the shared cache per day? How often do you see
corruptions?

Basically it helps if you introduce yourselves and your product first,
as this is I think you first time interacting with the community?

> I disagree, we can overwrite bad artifact. Yocto indirectly does that as it has
> to rebuild the package. This is "by design" behavior. And to be honest, there is
> no difference between (1) rebuilding the package every time and (2) overwriting
> sstate cache so any other build can reuse it. Is there any concern around
> uploading such freshly built artifact?

The concern is that your patch does not overwrite the artifact, rather
it deletes it first, and recreates it later. This creates a time
window where a cache object exists, and then it doesn't, and then it
exists again. This will break builds running in parallel in all sorts
of interesting ways, as sstate is not designed for objects
disappearing after they've been checked and confirmed to exist by
bitbake.

For example, when we do need to test cache deletions (for instance in
oe-selftest), we make super-sure that this is done on a private small
test cache that isn't shared with anything, as otherwise there have
been notoriously strange failures in random places.

The other concern I expressed to Max: this auto-recovery sweeps the
'flaky hardware' problem under the rug, instead of being loud and
clear about it. If someone had a perfectly working sstate (and many
people do, including the yocto upstream), and then it started throwing
random fails, they're not going to notice it. If someone had very rare
corruptions and then the rate increased, they're not going to notice
that either. Except when they start to wonder why builds seem to take
longer and longer and longer.

> This is random thing, we are not in control of e.g. DRAM bit flip error, they
> simply happen. To simulate the situation you can inject an error yourself by
> e.g., overwriting the random byte of the zstd file before it is uploaded.

Yes, I saw it. The key issue is this bit in sstate.bbclass before
actually creating the sstate archive:

        if sstate_pkg.exists():
            touch(sstate_pkg)
            return

If sstate item exists, then it will not be replaced, even if it's been
determined to be corrupted earlier.

I don't know yet how to best handle this, but I would want to improve
*reporting* of corrupt sstate before we can decide whether yocto can
do something about it that doesn't make things worse than they are
now.

Then you can take the report and run a script that deletes the
offending items. This all can be automated, and doesn't have to be
executed manually.

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
@ 2024-10-15 18:25 Yu, Max
  2024-10-15 20:32 ` Richard Purdie
  0 siblings, 1 reply; 12+ messages in thread
From: Yu, Max @ 2024-10-15 18:25 UTC (permalink / raw)
  To: Alexander Kanavin, Sobon, Przemyslaw
  Cc: openembedded-core@lists.openembedded.org

> But can you say what your scale actually is? How many sstate objects
> are written into the shared cache per day? How often do you see
> corruptions?

Our sstate cache has 25TB and 12M objects. I do not have the read/write metrics, but we run ~5k image builds per day, and we were seeing the corruption issue about once per two weeks before this patch. Our image builds range from 3k - 8k build steps, but these numbers become funky since these are not builds from scratch. (These corruptions we're facing are likely not bit flips, the example I gave was just trying to illustrate how rare events can happen that shouldn't just be summarized with "fix your infra". If it makes things better or worse, we only started to see these corruption issues after moving to kirkstone where they are now compressed with pzstd...)

For some more context, we at AWS use Yocto to build the OS for a lot of hardware platforms and a number of smaller component images. Cartesian product the two and we end up with needing to build a lot of images...

Specifically related to sstate caches, we have a remote sstate mirror setup in s3. All CI builds update the s3 mirror, so we have a large number of writers. For all our builds, we setup a local build directory per repo (a repo can have multiple), which is where we host the local sstate cache, and we parallelize based on build directory.

> The concern is that your patch does not overwrite the artifact, rather
> it deletes it first, and recreates it later. This creates a time
> window where a cache object exists, and then it doesn't, and then it
> exists again. This will break builds running in parallel in all sorts
> of interesting ways, as sstate is not designed for objects
> disappearing after they've been checked and confirmed to exist by
> bitbake.

This makes a lot of sense. This is a use case we didn't consider when creating the patch and not applicable to us. (Since we parallelize by multiple build directories that don't share a local sstate cache. Developer builds don't really care about the extra local parallelization, while CI builds share the remote sstate cache.)

> Then you can take the report and run a script that deletes the
> offending items. This all can be automated, and doesn't have to be
> executed manually.

So that's the tricky thing for us. We will have to run the script at a time window where no builds are happening. Because when I tried deleting corrupted objects from the s3 sstate cache manually, the corrupted sstate object ended up being reuploaded by ongoing builds... And this is pretty difficult to execute reliably, since we have so many builds.

I do understand more where you're coming from now, and it does sound like the way we implemented this fix is not applicable for all use cases. Can we make this a configurable option instead? 

Or if we don't want to remove an existing sstate object from the cache, would it be fine to then overwrite existing sstate cache objects after we finish rebuild it? With our current remote s3 sstate mirror, we already overwrite the remote when the object is different. (I think it might be possible to implement a marker during the decompression phase, and later overwrite the previous sstate object after the local rebuilds. Will need to look more into it. )

Thanks,
Max

On 10/10/24, 5:19 AM, "Alexander Kanavin" <alex.kanavin@gmail.com <mailto:alex.kanavin@gmail.com>> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Tue, 8 Oct 2024 at 20:47, Sobon, Przemyslaw <psobon@amazon.com <mailto:psobon@amazon.com>> wrote:
> The "only one reporting" does not mean the problem does not exist, it may be
> just not as big problem for others as for us. Similar problem exist for DRAM
> memory corruptions? Most of the people don't care about that but for some
> this is important problem, e.g. when you see 1 OS crash per 10 years it is
> not a big deal but if you own 10k servers you see 3 crashes per day. That is
> the scale factor that is important. Max talked about our scale already.
> Summarizing, the manual work is not a solution for us due to scale.

But can you say what your scale actually is? How many sstate objects
are written into the shared cache per day? How often do you see
corruptions?

Basically it helps if you introduce yourselves and your product first,
as this is I think you first time interacting with the community?

> I disagree, we can overwrite bad artifact. Yocto indirectly does that as it has
> to rebuild the package. This is "by design" behavior. And to be honest, there is
> no difference between (1) rebuilding the package every time and (2) overwriting
> sstate cache so any other build can reuse it. Is there any concern around
> uploading such freshly built artifact?

The concern is that your patch does not overwrite the artifact, rather
it deletes it first, and recreates it later. This creates a time
window where a cache object exists, and then it doesn't, and then it
exists again. This will break builds running in parallel in all sorts
of interesting ways, as sstate is not designed for objects
disappearing after they've been checked and confirmed to exist by
bitbake.

For example, when we do need to test cache deletions (for instance in
oe-selftest), we make super-sure that this is done on a private small
test cache that isn't shared with anything, as otherwise there have
been notoriously strange failures in random places.

The other concern I expressed to Max: this auto-recovery sweeps the
'flaky hardware' problem under the rug, instead of being loud and
clear about it. If someone had a perfectly working sstate (and many
people do, including the yocto upstream), and then it started throwing
random fails, they're not going to notice it. If someone had very rare
corruptions and then the rate increased, they're not going to notice
that either. Except when they start to wonder why builds seem to take
longer and longer and longer.

> This is random thing, we are not in control of e.g. DRAM bit flip error, they
> simply happen. To simulate the situation you can inject an error yourself by
> e.g., overwriting the random byte of the zstd file before it is uploaded.

Yes, I saw it. The key issue is this bit in sstate.bbclass before
actually creating the sstate archive:

if sstate_pkg.exists():
touch(sstate_pkg)
return

If sstate item exists, then it will not be replaced, even if it's been
determined to be corrupted earlier.

I don't know yet how to best handle this, but I would want to improve
*reporting* of corrupt sstate before we can decide whether yocto can
do something about it that doesn't make things worse than they are
now.

Then you can take the report and run a script that deletes the
offending items. This all can be automated, and doesn't have to be
executed manually.

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-15 18:25 Yu, Max
@ 2024-10-15 20:32 ` Richard Purdie
  2024-10-15 20:52   ` Alexander Kanavin
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Purdie @ 2024-10-15 20:32 UTC (permalink / raw)
  To: yumx, Alexander Kanavin, Sobon, Przemyslaw
  Cc: openembedded-core@lists.openembedded.org

On Tue, 2024-10-15 at 18:25 +0000, Yu, Max via lists.openembedded.org wrote:
> 
> > But can you say what your scale actually is? How many sstate objects
> > are written into the shared cache per day? How often do you see
> > corruptions?
> 
> Our sstate cache has 25TB and 12M objects. I do not have the
> read/write metrics, but we run ~5k image builds per day, and we were
> seeing the corruption issue about once per two weeks before this
> patch. Our image builds range from 3k - 8k build steps, but these
> numbers become funky since these are not builds from scratch. (These
> corruptions we're facing are likely not bit flips, the example I gave
> was just trying to illustrate how rare events can happen that
> shouldn't just be summarized with "fix your infra". If it makes
> things better or worse, we only started to see these corruption
> issues after moving to kirkstone where they are now compressed with
> pzstd...)
> 
> For some more context, we at AWS use Yocto to build the OS for a lot
> of hardware platforms and a number of smaller component images.
> Cartesian product the two and we end up with needing to build a lot
> of images...
> 
> Specifically related to sstate caches, we have a remote sstate mirror
> setup in s3. All CI builds update the s3 mirror, so we have a large
> number of writers. For all our builds, we setup a local build
> directory per repo (a repo can have multiple), which is where we host
> the local sstate cache, and we parallelize based on build directory.
> 
> > The concern is that your patch does not overwrite the artifact, rather
> > it deletes it first, and recreates it later. This creates a time
> > window where a cache object exists, and then it doesn't, and then it
> > exists again. This will break builds running in parallel in all sorts
> > of interesting ways, as sstate is not designed for objects
> > disappearing after they've been checked and confirmed to exist by
> > bitbake.
> 
> This makes a lot of sense. This is a use case we didn't consider when
> creating the patch and not applicable to us. (Since we parallelize by
> multiple build directories that don't share a local sstate cache.
> Developer builds don't really care about the extra local
> parallelization, while CI builds share the remote sstate cache.)
> 
> > Then you can take the report and run a script that deletes the
> > offending items. This all can be automated, and doesn't have to be
> > executed manually.
> 
> So that's the tricky thing for us. We will have to run the script at
> a time window where no builds are happening. Because when I tried
> deleting corrupted objects from the s3 sstate cache manually, the
> corrupted sstate object ended up being reuploaded by ongoing
> builds... And this is pretty difficult to execute reliably, since we
> have so many builds.

I'm a little bit puzzled here. How would something upload the corrupt
artefact back into the system?

The system has been very carefully designed on how we put artefacts
into the sstate directory. We do it by writing to a temporary file,
then move into place once the files are 100% complete. The move is
atomic even over NFS, something will always win.

Once the files exist, we don't overwrite them. This is essential to
prevent something from half reading them.

My question about "uploading the corrupt" artefact is because this
doesn't happen in what we do out the box.

I know you put everything into s3 and I worry that the s3 code has
races in it somehow letting workers see incomplete files, or
overwriting existing files which differ slightly causing the
corruption.

> I do understand more where you're coming from now, and it does sound
> like the way we implemented this fix is not applicable for all use
> cases. Can we make this a configurable option instead? 
> 
> Or if we don't want to remove an existing sstate object from the
> cache, would it be fine to then overwrite existing sstate cache
> objects after we finish rebuild it?

No, overwriting files is a recipe for disaster. See above, you get
partially complete files which come out as corrupted. We have enough
scale on our autobuilders we could see that ourselves.

>  With our current remote s3 sstate mirror, we already overwrite the
> remote when the object is different. (I think it might be possible to
> implement a marker during the decompression phase, and later
> overwrite the previous sstate object after the local rebuilds. Will
> need to look more into it. )

I think this overwriting may be the source of your problems. You should
not need to or be doing this. Why do you need to?

It is easy to blame bit flips and put a hack into the system to force
an overwrite but I think you have a more fundamental issue going on
which you may want to fix properly.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-15 20:32 ` Richard Purdie
@ 2024-10-15 20:52   ` Alexander Kanavin
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Kanavin @ 2024-10-15 20:52 UTC (permalink / raw)
  To: Richard Purdie
  Cc: yumx, Sobon, Przemyslaw, openembedded-core@lists.openembedded.org

On Tue, 15 Oct 2024 at 22:32, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> > So that's the tricky thing for us. We will have to run the script at
> > a time window where no builds are happening. Because when I tried
> > deleting corrupted objects from the s3 sstate cache manually, the
> > corrupted sstate object ended up being reuploaded by ongoing
> > builds... And this is pretty difficult to execute reliably, since we
> > have so many builds.
>
> I'm a little bit puzzled here. How would something upload the corrupt
> artefact back into the system?

I am also puzzled by this. I was going to suggest that perhaps there's
a rare bug in zstd compressor that deterministically (and
reproducibly) creates a corrupt archive on a certain input, because
ongoing builds can't simply 'reupload' something corrupted that has
been deleted from sstate. But maybe there's some extra proprietary
layer of 'overwriting' and 'synchronization' where all this trouble is
coming from.

Alex


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
@ 2024-10-15 22:43 Yu, Max
  2024-10-15 23:01 ` Richard Purdie
  2024-10-16 10:01 ` Alexander Kanavin
  0 siblings, 2 replies; 12+ messages in thread
From: Yu, Max @ 2024-10-15 22:43 UTC (permalink / raw)
  To: Alexander Kanavin, Richard Purdie
  Cc: Sobon, Przemyslaw, openembedded-core@lists.openembedded.org

> > > So that's the tricky thing for us. We will have to run the script at
> > > a time window where no builds are happening. Because when I tried
> > > deleting corrupted objects from the s3 sstate cache manually, the
> > > corrupted sstate object ended up being reuploaded by ongoing
> > > builds... And this is pretty difficult to execute reliably, since we
> > > have so many builds.
> >
> > I'm a little bit puzzled here. How would something upload the corrupt
> > artefact back into the system?
>
> I am also puzzled by this. I was going to suggest that perhaps there's
> a rare bug in zstd compressor that deterministically (and
> reproducibly) creates a corrupt archive on a certain input, because
> ongoing builds can't simply 'reupload' something corrupted that has
> been deleted from sstate. But maybe there's some extra proprietary
> layer of 'overwriting' and 'synchronization' where all this trouble is
> coming from.

Just dug into some of the details more, maybe this is related to how we implemented the s3 mirror. For context, our Yocto builds work like this for a recipe:
1. download the sstate siginfo+object from s3, into our local sstate cache. (I think this is unmodified yocto logic for mirrors just using SSTATE_MIRRORDIR)
2. build
3. upload the local siginfo+object back to s3. (This only happens if s3's object is different)

The reupload happens when we delete the s3 object between steps 1 and 3, where step 3 will just reupload their local sstate cache objects (corrupted in this case). Especially with how yocto parallelizes tasks, this time window between 1 and 2 is quite large. And with the number of builds that happen, we then have no reliable way of cleaning out s3 sstate objects. Note that, even if we changed step 3 to only upload if the object doesn't exist, we would still have this problem.

Reflecting on our s3 mirror setup, maybe we're doing things in a non-standard way, especially regarding step 3.

How do folks normally update sstate mirrors? Do people usually only have a single/small number of writers? (For our use case, we want to keep the remote sstate mirror very up to date.)

> No, overwriting files is a recipe for disaster. See above, you get
> partially complete files which come out as corrupted.

I understand your concerns with overwriting. What I was suggesting with "overwriting" was more creating a new file and `mv`ing it, which should be atomic. Same with the s3 operations we use to upload, they will not leave a file partially written/overwritten if we trust s3 claims.

> I think this overwriting may be the source of your problems. You should
> not need to or be doing this. Why do you need to?
> 
> It is easy to blame bit flips and put a hack into the system to force
> an overwrite but I think you have a more fundamental issue going on
> which you may want to fix properly.

Like I said, it's probably not something like bit flips, but since we only see these corruptions after moving to kirkstone, it might be related to things like using pzstd or other infra changes. The s3 uploading/updating logic did not change, and we are even using the same boto3 (aws python client) versions. We definitely do also want to fix the corruption; it's just been incredibly hard to narrow down what is causing it... 

Thanks for your help and patience,
Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-15 22:43 Yu, Max
@ 2024-10-15 23:01 ` Richard Purdie
  2024-10-16 10:01 ` Alexander Kanavin
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Purdie @ 2024-10-15 23:01 UTC (permalink / raw)
  To: Yu, Max, Alexander Kanavin
  Cc: Sobon, Przemyslaw, openembedded-core@lists.openembedded.org

On Tue, 2024-10-15 at 22:43 +0000, Yu, Max wrote:
> > > > So that's the tricky thing for us. We will have to run the
> > > > script at
> > > > a time window where no builds are happening. Because when I
> > > > tried
> > > > deleting corrupted objects from the s3 sstate cache manually,
> > > > the
> > > > corrupted sstate object ended up being reuploaded by ongoing
> > > > builds... And this is pretty difficult to execute reliably,
> > > > since we
> > > > have so many builds.
> > > 
> > > I'm a little bit puzzled here. How would something upload the
> > > corrupt
> > > artefact back into the system?
> > 
> > I am also puzzled by this. I was going to suggest that perhaps
> > there's
> > a rare bug in zstd compressor that deterministically (and
> > reproducibly) creates a corrupt archive on a certain input, because
> > ongoing builds can't simply 'reupload' something corrupted that has
> > been deleted from sstate. But maybe there's some extra proprietary
> > layer of 'overwriting' and 'synchronization' where all this trouble
> > is
> > coming from.
> 
> Just dug into some of the details more, maybe this is related to how
> we implemented the s3 mirror. For context, our Yocto builds work like
> this for a recipe:
> 1. download the sstate siginfo+object from s3, into our local sstate
> cache. (I think this is unmodified yocto logic for mirrors just using
> SSTATE_MIRRORDIR)
> 2. build
> 3. upload the local siginfo+object back to s3. (This only happens if
> s3's object is different)

Perhaps you could change the 3rd step to look at timestamps and only
upload modified files by local timestamp? If you then delete something
from the cache, that should stop it coming back?

> The reupload happens when we delete the s3 object between steps 1 and
> 3, where step 3 will just reupload their local sstate cache objects
> (corrupted in this case). Especially with how yocto parallelizes
> tasks, this time window between 1 and 2 is quite large. And with the
> number of builds that happen, we then have no reliable way of
> cleaning out s3 sstate objects. Note that, even if we changed step 3
> to only upload if the object doesn't exist, we would still have this
> problem.

How about only if the local objects have been modified?

> 
> Reflecting on our s3 mirror setup, maybe we're doing things in a non-
> standard way, especially regarding step 3.
> 
> How do folks normally update sstate mirrors? Do people usually only
> have a single/small number of writers? (For our use case, we want to
> keep the remote sstate mirror very up to date.)

We tend not to recommend updating sstate. In theory once things are
written there, they shouldn't need to change again. Writing isn't a
problem at all as long as things are not overwritten and appear
atomically.

> > No, overwriting files is a recipe for disaster. See above, you get
> > partially complete files which come out as corrupted.
> 
> I understand your concerns with overwriting. What I was suggesting
> with "overwriting" was more creating a new file and `mv`ing it, which
> should be atomic.

Creating a file with a mv is fine. Overwriting with a mv is not atomic,
at least on NFS.

Consider that a build is downloading a large sstate object, say webkit
or something which is say 100+MB large. If you do a mv operation on the
file over NFS on another node to replace that file, the NFS server does
not cache all 100MB of the old file. At some point the transfer will
corrupt as the content changes.

For small files, it isn't as obvious as the files are usually fully in
cache and completes uninterrupted.

>  Same with the s3 operations we use to upload, they will not leave a
> file partially written/overwritten if we trust s3 claims.

If a user is downloading a large object and another replaces it, will
the first user get the full original object?

> > I think this overwriting may be the source of your problems. You
> > should
> > not need to or be doing this. Why do you need to?
> > 
> > It is easy to blame bit flips and put a hack into the system to
> > force
> > an overwrite but I think you have a more fundamental issue going on
> > which you may want to fix properly.
> 
> Like I said, it's probably not something like bit flips, but since we
> only see these corruptions after moving to kirkstone, it might be
> related to things like using pzstd or other infra changes. The s3
> uploading/updating logic did not change, and we are even using the
> same boto3 (aws python client) versions. We definitely do also want
> to fix the corruption; it's just been incredibly hard to narrow down
> what is causing it... 

I'd advise caution from experience. If you fix the wrong thing, you can
set yourself up for other much harder issues down the road. I'm also
reluctant to add things into core classes to work around problems which
aren't in anything we actually maintain or have access to.

Cheers,

Richard







^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror
  2024-10-15 22:43 Yu, Max
  2024-10-15 23:01 ` Richard Purdie
@ 2024-10-16 10:01 ` Alexander Kanavin
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Kanavin @ 2024-10-16 10:01 UTC (permalink / raw)
  To: Yu, Max
  Cc: Richard Purdie, Sobon, Przemyslaw,
	openembedded-core@lists.openembedded.org

On Wed, 16 Oct 2024 at 00:43, Yu, Max <yumx@amazon.com> wrote:
> How do folks normally update sstate mirrors? Do people usually only have a single/small number of writers? (For our use case, we want to keep the remote sstate mirror very up to date.)

The standard, recommended setup is that all CI builders write directly
to a shared NFS mount (and read from it, obviously). This has been
stable and proven for a very long time, and is relied heavily on by
the yocto upstream CI itself.

Then this NFS is as well exported directly over http as sstate mirror
for usage by machines that can't be trusted to write into it, e.g.
company developers or even a public URI for 'the world'.

People try all sorts of deviations from this setup, such as having
explicit synchronizations after the fact, but this can introduce
issues, and certainly introduces cache misses and longer builds.

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-10-16 10:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-08 18:47 [OE-core] [PATCH] sstate: remove corrupted artifacts from local mirror Sobon, Przemyslaw
2024-10-10 12:18 ` Alexander Kanavin
  -- strict thread matches above, loose matches on Subject: below --
2024-10-15 22:43 Yu, Max
2024-10-15 23:01 ` Richard Purdie
2024-10-16 10:01 ` Alexander Kanavin
2024-10-15 18:25 Yu, Max
2024-10-15 20:32 ` Richard Purdie
2024-10-15 20:52   ` Alexander Kanavin
2024-10-05  0:44 Max Yu
2024-10-07 11:57 ` [OE-core] " Alexander Kanavin
2024-10-07 20:09   ` Yu, Max
2024-10-07 20:38     ` Alexander Kanavin
2024-10-08  9:37     ` Alexander Kanavin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox