* Bitbake do_unpack checksum?
@ 2014-09-22 15:08 Olivier Dugas
2014-09-22 16:16 ` Burton, Ross
0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dugas @ 2014-09-22 15:08 UTC (permalink / raw)
To: bitbake-devel
[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]
Hi all,
[Go to paragraph surrounded by # if in a hurry]
The company I work for is using Yocto (customized version of dylan). We
recently faced a little problem with bitbake and my google-fu seems to
be insufficient to solve it.
I know that when SRC_URI of a recipe is changed, the MD5 and SHA sums
will have to be updated. If not, bitbake will fail with a very verbose
error telling exactly what are the new checksums so I can easily update
them in the recipe.
I also know that a tarball of a recipe might change on a server without
having its version changed. This is a very bad practice but when this
happen we are dealing with it.
No, the problem we're having is not about these well documented cases.
The problem is about rotten bits.
You see, we have a recipe (let's call it foo) that need to be build from
scratch. Bitbake will do_fetch() it, and verify the checksums.
Everything's fine, the tarball is saved in the yocto's downloads folder.
Bitbake then do_unpack() the tarball and starts building in the tmp
directory. Marvellous. The build succeeds and everybody is happy.
Then, say I want to rebuild a yocto image after having modified a recipe
(bar). bar is needed by foo, so foo will have to be rebuilt. No problem
everything is fine. foo is not redownloaded since it's already in
downloads. there's a do_unpack() and so on.
Here's the problem. A week later, I modify bar again, so foo will need
to be rebuilt. A bit on the hard drive flipped so the checksum of foo's
tarball does not match anymore. Apparently and from what I read,
Bitbake's do_unpack does not verify the checksum as it's only validated
during do_fetch. Building and installing foo succeeds though as the
rotten bit only affects runtime. The image is created like no problem
occured and we install the image. Then there's the crash, we search a
lot and finally find the issue. A simple -c cleanall foo and the problem
will disappear.
#
Now, my question : Would it be hard to bitbake's contributors to add a
checksum validation during the do_unpack step? Hard-drive errors like
this can occur, and such above-mentionned pain would be so easily
avoided by a quick checksum...
#
Many thanks.
Lee
P.S.: " If I had more time, I would have written a shorter letter"
[-- Attachment #2: Type: text/html, Size: 3049 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bitbake do_unpack checksum?
2014-09-22 15:08 Bitbake do_unpack checksum? Olivier Dugas
@ 2014-09-22 16:16 ` Burton, Ross
2014-09-22 16:18 ` Burton, Ross
0 siblings, 1 reply; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 16:16 UTC (permalink / raw)
To: Olivier Dugas; +Cc: bitbake-devel
On 22 September 2014 16:08, Olivier Dugas <dugaso@sonatest.com> wrote:
> Now, my question : Would it be hard to bitbake's contributors to add a
> checksum validation during the do_unpack step? Hard-drive errors like this
> can occur, and such above-mentionned pain would be so easily avoided by a
> quick checksum...
No, it wouldn't be hard.
https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571
Ross
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bitbake do_unpack checksum?
2014-09-22 16:16 ` Burton, Ross
@ 2014-09-22 16:18 ` Burton, Ross
2014-09-22 17:04 ` Olivier Dugas
0 siblings, 1 reply; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 16:18 UTC (permalink / raw)
To: Olivier Dugas; +Cc: bitbake-devel
On 22 September 2014 17:16, Burton, Ross <ross.burton@intel.com> wrote:
> No, it wouldn't be hard.
>
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571
I hit sent a little early then. 5571 is a related issue, but if
you're a disk which is suffering from random bit flips, then do you
want to trust it to building a file system image that likely is
corrupted? By extension we should checksum every file we generate
just in case they get corrupted too...
Ross
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bitbake do_unpack checksum?
2014-09-22 16:18 ` Burton, Ross
@ 2014-09-22 17:04 ` Olivier Dugas
2014-09-22 19:20 ` Burton, Ross
0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dugas @ 2014-09-22 17:04 UTC (permalink / raw)
To: Burton, Ross; +Cc: bitbake-devel
[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]
Hi Ross and thank you for having taken the time,
I agree about the fact that if the disk might suffer from random bit
flips, I couldn't trust it. In fact, like proposed there
(https://wiki.yoctoproject.org/wiki/Build_Performance), I optimised the
build performance knowing that the file system image would be built
faster at the costs implied, which are (among others) that I could not
trust the image.
The only thing that gets on a disk that is totally healthy (because we
need to keep away from bit flips) is the download folder with all the
tarballs. Still, you know that bit flips can occur, granted that it's
much less frequent. The problem here is that it's the first time maybe 3
years that a bit flip occured. So it's not really a problem about disk
about to fail, but more about random flips caused by say solar wind or
something!
I personnally would be satisfied by the point made by Richard Purdie in
bug 5571 (comment 2), that is putting the checksum into the .done file.
I believe this would indeed avoid errors like the one we got here. Was
it implemented? If somebody did this, do you know the commit id, so I
can try to cherry pick it.
Best regards,
*Olivier*
Le 2014-09-22 12:18, Burton, Ross a écrit :
> On 22 September 2014 17:16, Burton, Ross <ross.burton@intel.com> wrote:
>> No, it wouldn't be hard.
>>
>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571
> I hit sent a little early then. 5571 is a related issue, but if
> you're a disk which is suffering from random bit flips, then do you
> want to trust it to building a file system image that likely is
> corrupted? By extension we should checksum every file we generate
> just in case they get corrupted too...
>
> Ross
[-- Attachment #2: Type: text/html, Size: 3097 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bitbake do_unpack checksum?
2014-09-22 17:04 ` Olivier Dugas
@ 2014-09-22 19:20 ` Burton, Ross
2014-09-22 19:37 ` Olivier Dugas
0 siblings, 1 reply; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 19:20 UTC (permalink / raw)
To: Olivier Dugas; +Cc: bitbake-devel
On 22 September 2014 18:04, Olivier Dugas <dugaso@sonatest.com> wrote:
> I personnally would be satisfied by the point made by Richard Purdie in bug
> 5571 (comment 2), that is putting the checksum into the .done file. I
> believe this would indeed avoid errors like the one we got here. Was it
> implemented? If somebody did this, do you know the commit id, so I can try
> to cherry pick it.
Comment 2 is designed to solve a different problem and won't catch the
tarball on disk getting corrupted because it will simply be comparing
a cached checksum in the .done file with the stated checksum in the
recipe.
I'm not sure what points in the Builld Performance page mean you
couldn't trust the resulting image. There are some tweaks that mean
data loss in the event of power failure (ie no journal, high write
delay) but in this situation the corrupted disk can be reformatted as
it only contains built objects and can be entirely regenerated.
Ross
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bitbake do_unpack checksum?
2014-09-22 19:20 ` Burton, Ross
@ 2014-09-22 19:37 ` Olivier Dugas
2014-09-22 22:20 ` Burton, Ross
0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dugas @ 2014-09-22 19:37 UTC (permalink / raw)
To: Burton, Ross; +Cc: bitbake-devel
[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]
my bad, you're right. I would need to recompute the checksum at every
do_unpack().
I know that no journal and high write delay could corrupt my disk. In
this event I would reformat and rebuild the image, no big deal. What
seem to me like a big deal though is to have a healthy hard drive that
will once every 2 year flip a bit randomly on one of my downloaded
tarball, and not having bitbake verify the checksum just in case.
When I say it this way, I realize having bitbake doing checksum for
every already downloaded tarball in the system could indeed add
significant overhead, like said by Richard Purdie, even if I thought
that a checksum was a fast operation... So I'm stuck. What would be the
best solution then? I can't see any other simple solution then the one
you proposed in #5571.
What do you think?
*Olivier*
Le 2014-09-22 15:20, Burton, Ross a écrit :
> On 22 September 2014 18:04, Olivier Dugas <dugaso@sonatest.com> wrote:
>> I personnally would be satisfied by the point made by Richard Purdie in bug
>> 5571 (comment 2), that is putting the checksum into the .done file. I
>> believe this would indeed avoid errors like the one we got here. Was it
>> implemented? If somebody did this, do you know the commit id, so I can try
>> to cherry pick it.
> Comment 2 is designed to solve a different problem and won't catch the
> tarball on disk getting corrupted because it will simply be comparing
> a cached checksum in the .done file with the stated checksum in the
> recipe.
>
> I'm not sure what points in the Builld Performance page mean you
> couldn't trust the resulting image. There are some tweaks that mean
> data loss in the event of power failure (ie no journal, high write
> delay) but in this situation the corrupted disk can be reformatted as
> it only contains built objects and can be entirely regenerated.
>
> Ross
[-- Attachment #2: Type: text/html, Size: 2988 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bitbake do_unpack checksum?
2014-09-22 19:37 ` Olivier Dugas
@ 2014-09-22 22:20 ` Burton, Ross
0 siblings, 0 replies; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 22:20 UTC (permalink / raw)
To: Olivier Dugas; +Cc: bitbake-devel
On 22 September 2014 20:37, Olivier Dugas <dugaso@sonatest.com> wrote:
> I know that no journal and high write delay could corrupt my disk. In this
> event I would reformat and rebuild the image, no big deal. What seem to me
> like a big deal though is to have a healthy hard drive that will once every
> 2 year flip a bit randomly on one of my downloaded tarball, and not having
> bitbake verify the checksum just in case.
>
> When I say it this way, I realize having bitbake doing checksum for every
> already downloaded tarball in the system could indeed add significant
> overhead, like said by Richard Purdie, even if I thought that a checksum was
> a fast operation... So I'm stuck. What would be the best solution then? I
> can't see any other simple solution then the one you proposed in #5571.
I think that if you're worrying about a bit flipping once in two years
then there are far more important places it could flip than the source
tarball archives. tarballs/zip files etc have CRCs internally so a
flipped bit will be detected. A bit flipping inside a compiled object
before it reaches a CRC'd package won't be detected until it crashes
or executes incorrectly.
As far as I'm aware this is mostly a hypothetical problem: hard drives
employ CRC to detect errors when reading sectors and will remap as bad
sectors appears. Wikipedia says "10 non-recoverable read errors in
every 10^16 bits" and they'll be reported to the operating system as a
bad sector error and not silent corruption. I'm happy to live with
that error rate.
Ross
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-09-22 22:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-22 15:08 Bitbake do_unpack checksum? Olivier Dugas
2014-09-22 16:16 ` Burton, Ross
2014-09-22 16:18 ` Burton, Ross
2014-09-22 17:04 ` Olivier Dugas
2014-09-22 19:20 ` Burton, Ross
2014-09-22 19:37 ` Olivier Dugas
2014-09-22 22:20 ` Burton, Ross
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.