All of lore.kernel.org
 help / color / mirror / Atom feed
* Bitbake do_unpack checksum?
@ 2014-09-22 15:08 Olivier Dugas
  2014-09-22 16:16 ` Burton, Ross
  0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dugas @ 2014-09-22 15:08 UTC (permalink / raw)
  To: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

Hi all,
[Go to paragraph surrounded by # if in a hurry]

The company I work for is using Yocto (customized version of dylan). We 
recently faced a little problem with bitbake and my google-fu seems to 
be insufficient to solve it.

I know that when SRC_URI of a recipe is changed, the MD5 and SHA sums 
will have to be updated. If not, bitbake will fail with a very verbose 
error telling exactly what are the new checksums so I can easily update 
them in the recipe.

I also know that a tarball of a recipe might change on a server without 
having its version changed. This is a very bad practice but when this 
happen we are dealing with it.

No, the problem we're having is not about these well documented cases. 
The problem is about rotten bits.

You see, we have a recipe (let's call it foo) that need to be build from 
scratch. Bitbake will do_fetch() it, and verify the checksums. 
Everything's fine, the tarball is saved in the yocto's downloads folder. 
Bitbake then do_unpack() the tarball and starts building in the tmp 
directory. Marvellous. The build succeeds and everybody is happy.

Then, say I want to rebuild a yocto image after having modified a recipe 
(bar). bar is needed by foo, so foo will have to be rebuilt. No problem 
everything is fine. foo is not redownloaded since it's already in 
downloads. there's a do_unpack() and so on.

Here's the problem. A week later, I modify bar again, so foo will need 
to be rebuilt. A bit on the hard drive flipped so the checksum of foo's 
tarball does not match anymore. Apparently and from what I read, 
Bitbake's do_unpack does not verify the checksum as it's only validated 
during do_fetch. Building and installing foo succeeds though as the 
rotten bit only affects runtime. The image is created like no problem 
occured and we install the image. Then there's the crash, we search a 
lot and finally find the issue. A simple -c cleanall foo and the problem 
will disappear.

#
Now, my question : Would it be hard to bitbake's contributors to add a 
checksum validation during the do_unpack step? Hard-drive errors like 
this can occur, and such above-mentionned pain would be so easily 
avoided by a quick checksum...
#

Many thanks.
Lee
P.S.: " If I had more time, I would have written a shorter letter"


[-- Attachment #2: Type: text/html, Size: 3049 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bitbake do_unpack checksum?
  2014-09-22 15:08 Bitbake do_unpack checksum? Olivier Dugas
@ 2014-09-22 16:16 ` Burton, Ross
  2014-09-22 16:18   ` Burton, Ross
  0 siblings, 1 reply; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 16:16 UTC (permalink / raw)
  To: Olivier Dugas; +Cc: bitbake-devel

On 22 September 2014 16:08, Olivier Dugas <dugaso@sonatest.com> wrote:
> Now, my question : Would it be hard to bitbake's contributors to add a
> checksum validation during the do_unpack step? Hard-drive errors like this
> can occur, and such above-mentionned pain would be so easily avoided by a
> quick checksum...

No, it wouldn't be hard.

https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571

Ross


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bitbake do_unpack checksum?
  2014-09-22 16:16 ` Burton, Ross
@ 2014-09-22 16:18   ` Burton, Ross
  2014-09-22 17:04     ` Olivier Dugas
  0 siblings, 1 reply; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 16:18 UTC (permalink / raw)
  To: Olivier Dugas; +Cc: bitbake-devel

On 22 September 2014 17:16, Burton, Ross <ross.burton@intel.com> wrote:
> No, it wouldn't be hard.
>
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571

I hit sent a little early then.  5571 is a related issue, but if
you're a disk which is suffering from random bit flips, then do you
want to trust it to building a file system image that likely is
corrupted?  By extension we should checksum every file we generate
just in case they get corrupted too...

Ross


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bitbake do_unpack checksum?
  2014-09-22 16:18   ` Burton, Ross
@ 2014-09-22 17:04     ` Olivier Dugas
  2014-09-22 19:20       ` Burton, Ross
  0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dugas @ 2014-09-22 17:04 UTC (permalink / raw)
  To: Burton, Ross; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

Hi Ross and thank you for having taken the time,
I agree about the fact that if the disk might suffer from random bit 
flips, I couldn't trust it. In fact, like proposed there 
(https://wiki.yoctoproject.org/wiki/Build_Performance), I optimised the 
build performance knowing that the file system image would be built 
faster at the costs implied, which are (among others) that I could not 
trust the image.

The only thing that gets on a disk that is totally healthy (because we 
need to keep away from bit flips) is the download folder with all the 
tarballs. Still, you know that bit flips can occur, granted that it's 
much less frequent. The problem here is that it's the first time maybe 3 
years that a bit flip occured. So it's not really a problem about disk 
about to fail, but more about random flips caused by say solar wind or 
something!

I personnally would be satisfied by the point made by Richard Purdie in 
bug 5571 (comment 2), that is putting the checksum into the .done file. 
I believe this would indeed avoid errors like the one we got here. Was 
it implemented? If somebody did this, do you know the commit id, so I 
can try to cherry pick it.

Best regards,

*Olivier*

Le 2014-09-22 12:18, Burton, Ross a écrit :
> On 22 September 2014 17:16, Burton, Ross <ross.burton@intel.com> wrote:
>> No, it wouldn't be hard.
>>
>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571
> I hit sent a little early then.  5571 is a related issue, but if
> you're a disk which is suffering from random bit flips, then do you
> want to trust it to building a file system image that likely is
> corrupted?  By extension we should checksum every file we generate
> just in case they get corrupted too...
>
> Ross


[-- Attachment #2: Type: text/html, Size: 3097 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bitbake do_unpack checksum?
  2014-09-22 17:04     ` Olivier Dugas
@ 2014-09-22 19:20       ` Burton, Ross
  2014-09-22 19:37         ` Olivier Dugas
  0 siblings, 1 reply; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 19:20 UTC (permalink / raw)
  To: Olivier Dugas; +Cc: bitbake-devel

On 22 September 2014 18:04, Olivier Dugas <dugaso@sonatest.com> wrote:
> I personnally would be satisfied by the point made by Richard Purdie in bug
> 5571 (comment 2), that is putting the checksum into the .done file. I
> believe this would indeed avoid errors like the one we got here. Was it
> implemented? If somebody did this, do you know the commit id, so I can try
> to cherry pick it.

Comment 2 is designed to solve a different problem and won't catch the
tarball on disk getting corrupted because it will simply be comparing
a cached checksum in the .done file with the stated checksum in the
recipe.

I'm not sure what points in the Builld Performance page mean you
couldn't trust the resulting image.  There are some tweaks that mean
data loss in the event of power failure (ie no journal, high write
delay) but in this situation the corrupted disk can be reformatted as
it only contains built objects and can be entirely regenerated.

Ross


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bitbake do_unpack checksum?
  2014-09-22 19:20       ` Burton, Ross
@ 2014-09-22 19:37         ` Olivier Dugas
  2014-09-22 22:20           ` Burton, Ross
  0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dugas @ 2014-09-22 19:37 UTC (permalink / raw)
  To: Burton, Ross; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

my bad, you're right. I would need to recompute the checksum at every 
do_unpack().
I know that no journal and high write delay could corrupt my disk. In 
this event I would reformat and rebuild the image, no big deal. What 
seem to me like a big deal though is to have a healthy hard drive that 
will once every 2 year flip a bit randomly on one of my downloaded 
tarball, and not having bitbake verify the checksum just in case.

When I say it this way, I realize having bitbake doing checksum for 
every already downloaded tarball in the system could indeed add 
significant overhead, like said by Richard Purdie, even if I thought 
that a checksum was a fast operation... So I'm stuck. What would be the 
best solution then? I can't see any other simple solution then the one 
you proposed in #5571.

What do you think?

*Olivier*

Le 2014-09-22 15:20, Burton, Ross a écrit :
> On 22 September 2014 18:04, Olivier Dugas <dugaso@sonatest.com> wrote:
>> I personnally would be satisfied by the point made by Richard Purdie in bug
>> 5571 (comment 2), that is putting the checksum into the .done file. I
>> believe this would indeed avoid errors like the one we got here. Was it
>> implemented? If somebody did this, do you know the commit id, so I can try
>> to cherry pick it.
> Comment 2 is designed to solve a different problem and won't catch the
> tarball on disk getting corrupted because it will simply be comparing
> a cached checksum in the .done file with the stated checksum in the
> recipe.
>
> I'm not sure what points in the Builld Performance page mean you
> couldn't trust the resulting image.  There are some tweaks that mean
> data loss in the event of power failure (ie no journal, high write
> delay) but in this situation the corrupted disk can be reformatted as
> it only contains built objects and can be entirely regenerated.
>
> Ross


[-- Attachment #2: Type: text/html, Size: 2988 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bitbake do_unpack checksum?
  2014-09-22 19:37         ` Olivier Dugas
@ 2014-09-22 22:20           ` Burton, Ross
  0 siblings, 0 replies; 7+ messages in thread
From: Burton, Ross @ 2014-09-22 22:20 UTC (permalink / raw)
  To: Olivier Dugas; +Cc: bitbake-devel

On 22 September 2014 20:37, Olivier Dugas <dugaso@sonatest.com> wrote:
> I know that no journal and high write delay could corrupt my disk. In this
> event I would reformat and rebuild the image, no big deal. What seem to me
> like a big deal though is to have a healthy hard drive that will once every
> 2 year flip a bit randomly on one of my downloaded tarball, and not having
> bitbake verify the checksum just in case.
>
> When I say it this way, I realize having bitbake doing checksum for every
> already downloaded tarball in the system could indeed add significant
> overhead, like said by Richard Purdie, even if I thought that a checksum was
> a fast operation... So I'm stuck. What would be the best solution then? I
> can't see any other simple solution then the one you proposed in #5571.

I think that if you're worrying about a bit flipping once in two years
then there are far more important places it could flip than the source
tarball archives.  tarballs/zip files etc have CRCs internally so a
flipped bit will be detected.  A bit flipping inside a compiled object
before it reaches a CRC'd package won't be detected until it crashes
or executes incorrectly.

As far as I'm aware this is mostly a hypothetical problem: hard drives
employ CRC to detect errors when reading sectors and will remap as bad
sectors appears.  Wikipedia says "10 non-recoverable read errors in
every 10^16 bits" and they'll be reported to the operating system as a
bad sector error and not silent corruption.  I'm happy to live with
that error rate.

Ross


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-22 22:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-22 15:08 Bitbake do_unpack checksum? Olivier Dugas
2014-09-22 16:16 ` Burton, Ross
2014-09-22 16:18   ` Burton, Ross
2014-09-22 17:04     ` Olivier Dugas
2014-09-22 19:20       ` Burton, Ross
2014-09-22 19:37         ` Olivier Dugas
2014-09-22 22:20           ` Burton, Ross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.