Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
@ 2017-09-09 17:17 Petr Kulhavy
  2017-09-09 20:08 ` Thomas Petazzoni
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Kulhavy @ 2017-09-09 17:17 UTC (permalink / raw)
  To: buildroot

Update Linuxptp to the latest version from 1. September 2017
This update brings bugfixes and minor enhancements.

Signed-off-by: Petr Kulhavy <brain@jikos.cz>
---
 package/linuxptp/linuxptp.hash | 4 ++--
 package/linuxptp/linuxptp.mk   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/package/linuxptp/linuxptp.hash b/package/linuxptp/linuxptp.hash
index ccda2d6..de28ee4 100644
--- a/package/linuxptp/linuxptp.hash
+++ b/package/linuxptp/linuxptp.hash
@@ -1,2 +1,2 @@
-# Locally computed:
-sha256	b8190ab71a99f1dc32847f33cb301d2464d3f9e5f4c51300d55589aff42e8b3f  linuxptp-97c351cafd7327fd28047580c9e2528a6f7e742b.tar.gz
+# No hash for linuxptp, cloned from a remote repo
+none	xxx								  linuxptp-17c9787b1d6891636b5be9e4e5a08278b44e9a7a.tar.gz
diff --git a/package/linuxptp/linuxptp.mk b/package/linuxptp/linuxptp.mk
index bf2176c..38d8e09 100644
--- a/package/linuxptp/linuxptp.mk
+++ b/package/linuxptp/linuxptp.mk
@@ -4,7 +4,7 @@
 #
 ################################################################################
 
-LINUXPTP_VERSION = 97c351cafd7327fd28047580c9e2528a6f7e742b
+LINUXPTP_VERSION = 17c9787b1d6891636b5be9e4e5a08278b44e9a7a
 LINUXPTP_SITE_METHOD = git
 LINUXPTP_SITE = git://git.code.sf.net/p/linuxptp/code
 LINUXPTP_LICENSE = GPL-2.0+
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-09 17:17 [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version Petr Kulhavy
@ 2017-09-09 20:08 ` Thomas Petazzoni
  2017-09-09 20:53   ` Petr Kulhavy
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Petazzoni @ 2017-09-09 20:08 UTC (permalink / raw)
  To: buildroot

Hello,

On Sat,  9 Sep 2017 19:17:29 +0200, Petr Kulhavy wrote:

> diff --git a/package/linuxptp/linuxptp.hash b/package/linuxptp/linuxptp.hash
> index ccda2d6..de28ee4 100644
> --- a/package/linuxptp/linuxptp.hash
> +++ b/package/linuxptp/linuxptp.hash
> @@ -1,2 +1,2 @@
> -# Locally computed:
> -sha256	b8190ab71a99f1dc32847f33cb301d2464d3f9e5f4c51300d55589aff42e8b3f  linuxptp-97c351cafd7327fd28047580c9e2528a6f7e742b.tar.gz
> +# No hash for linuxptp, cloned from a remote repo
> +none	xxx								  linuxptp-17c9787b1d6891636b5be9e4e5a08278b44e9a7a.tar.gz

Why are you removing the hash ?

Buildroot's mechanism to clone from a Git repository and generate a
tarball is reproducible, so we can use hashes and we do use them.

Could you fix this and send an updated version ?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-09 20:08 ` Thomas Petazzoni
@ 2017-09-09 20:53   ` Petr Kulhavy
  2017-09-10  6:04     ` Thomas Petazzoni
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Kulhavy @ 2017-09-09 20:53 UTC (permalink / raw)
  To: buildroot

Hi Thomas,

On 09/09/17 22:08, Thomas Petazzoni wrote:
> Hello,
>
> On Sat,  9 Sep 2017 19:17:29 +0200, Petr Kulhavy wrote:
>
>> diff --git a/package/linuxptp/linuxptp.hash b/package/linuxptp/linuxptp.hash
>> index ccda2d6..de28ee4 100644
>> --- a/package/linuxptp/linuxptp.hash
>> +++ b/package/linuxptp/linuxptp.hash
>> @@ -1,2 +1,2 @@
>> -# Locally computed:
>> -sha256	b8190ab71a99f1dc32847f33cb301d2464d3f9e5f4c51300d55589aff42e8b3f  linuxptp-97c351cafd7327fd28047580c9e2528a6f7e742b.tar.gz
>> +# No hash for linuxptp, cloned from a remote repo
>> +none	xxx								  linuxptp-17c9787b1d6891636b5be9e4e5a08278b44e9a7a.tar.gz
> Why are you removing the hash ?
Well, I was just following the BR manual :-) Where it says that "none" 
is used for cloned repositories.

> Buildroot's mechanism to clone from a Git repository and generate a
> tarball is reproducible, so we can use hashes and we do use them.
Is there a command to just clone and compress the repo via BR?
The <package>-extract make target fails if the hash doesn't exist and 
consequently deletes the temporary files.

Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-09 20:53   ` Petr Kulhavy
@ 2017-09-10  6:04     ` Thomas Petazzoni
  2017-09-10  9:24       ` Yann E. MORIN
  2017-09-10  9:57       ` Petr Kulhavy
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Petazzoni @ 2017-09-10  6:04 UTC (permalink / raw)
  To: buildroot

Hello,

On Sat, 9 Sep 2017 22:53:06 +0200, Petr Kulhavy wrote:

> > Why are you removing the hash ?  
> Well, I was just following the BR manual :-) Where it says that "none" 
> is used for cloned repositories.

From the manual:

"""
Hashes are currently checked for files fetched from http/ftp servers,
Git repositories, files copied using scp and local files. Hashes are
not checked for other version control systems (such as Subversion, CVS,
etc.) because Buildroot currently does not generate reproducible
tarballs when source code is fetched from such version control systems.
"""

But I indeed see:

"""
The none hash type is reserved to those archives downloaded from a
repository, like a git clone, a subversion checkout?
"""

So we have to fix the manual :)

> Is there a command to just clone and compress the repo via BR?
> The <package>-extract make target fails if the hash doesn't exist and 
> consequently deletes the temporary files.

Yeah, it's a bit annoying. If you put a none hash temporarily, then you
can have the tarball downloaded, calculate its hash, and add it. We
also had proposals like https://patchwork.ozlabs.org/patch/791357/ to
help with this.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10  6:04     ` Thomas Petazzoni
@ 2017-09-10  9:24       ` Yann E. MORIN
  2017-09-10 10:31         ` Petr Kulhavy
  2017-09-10  9:57       ` Petr Kulhavy
  1 sibling, 1 reply; 11+ messages in thread
From: Yann E. MORIN @ 2017-09-10  9:24 UTC (permalink / raw)
  To: buildroot

On 2017-09-10 08:04 +0200, Thomas Petazzoni spake thusly:
> On Sat, 9 Sep 2017 22:53:06 +0200, Petr Kulhavy wrote:
> > Is there a command to just clone and compress the repo via BR?
> > The <package>-extract make target fails if the hash doesn't exist and 
> > consequently deletes the temporary files.
> Yeah, it's a bit annoying. If you put a none hash temporarily, then you
> can have the tarball downloaded, calculate its hash, and add it. We
> also had proposals like https://patchwork.ozlabs.org/patch/791357/ to
> help with this.

IIRC, I was opposed to that change, because we want the user to go and
get the hash as provided by upstream (e.g. in a release email).

Having the infra pre-calculate the hash locally defeats the very purpose
of the hashes: check that what we got is what upstream provides.

We accept local calculation of hashes only in the case that upstream
does not provide it.

As an aside, the patch does two things, so should be split.

I'll go and reply that to that original patch.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10  6:04     ` Thomas Petazzoni
  2017-09-10  9:24       ` Yann E. MORIN
@ 2017-09-10  9:57       ` Petr Kulhavy
  1 sibling, 0 replies; 11+ messages in thread
From: Petr Kulhavy @ 2017-09-10  9:57 UTC (permalink / raw)
  To: buildroot

Hi Thomas,

On 10/09/17 08:04, Thomas Petazzoni wrote:
> Hello,
>
> On Sat, 9 Sep 2017 22:53:06 +0200, Petr Kulhavy wrote:
>
>>> Why are you removing the hash ?
>> Well, I was just following the BR manual :-) Where it says that "none"
>> is used for cloned repositories.
> >From the manual:
>
> """
> Hashes are currently checked for files fetched from http/ftp servers,
> Git repositories, files copied using scp and local files. Hashes are
> not checked for other version control systems (such as Subversion, CVS,
> etc.) because Buildroot currently does not generate reproducible
> tarballs when source code is fetched from such version control systems.
> """

Even here I would be more specific. A file fetched from a GIT repository 
is for me not the same as a clone of a GIT repository.


Regards
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10  9:24       ` Yann E. MORIN
@ 2017-09-10 10:31         ` Petr Kulhavy
  2017-09-10 18:18           ` Yann E. MORIN
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Kulhavy @ 2017-09-10 10:31 UTC (permalink / raw)
  To: buildroot

Hi Yann, Thomas,

On 10/09/17 11:24, Yann E. MORIN wrote:
> On 2017-09-10 08:04 +0200, Thomas Petazzoni spake thusly:
>> On Sat, 9 Sep 2017 22:53:06 +0200, Petr Kulhavy wrote:
>>> Is there a command to just clone and compress the repo via BR?
>>> The <package>-extract make target fails if the hash doesn't exist and
>>> consequently deletes the temporary files.
>> Yeah, it's a bit annoying. If you put a none hash temporarily, then you
>> can have the tarball downloaded, calculate its hash, and add it. We
>> also had proposals like https://patchwork.ozlabs.org/patch/791357/ to
>> help with this.
> IIRC, I was opposed to that change, because we want the user to go and
> get the hash as provided by upstream (e.g. in a release email).
>
> Having the infra pre-calculate the hash locally defeats the very purpose
> of the hashes: check that what we got is what upstream provides.
Doesn't the idea of a hash of a cloned and zipped GIT repo go a little 
bit against this?
I mean, I have never seen any upstream providing a hash for a specific 
clone of a repo.
In fact, that is what the GIT hash provides, in a slightly different form.

So I must say I'm bit missing the point of providing a hash for cloned 
and zipped GIT repo.
What is the hash trying to protect?

On the contrary, I even think it is a wrong approach. The zip is created 
locally after the clone. And the output, or the hash if you want, 
depends on the zip tool used and its settings (compression level, etc.).
So if someone uses a tool with a different default compression level or 
for instance gzip gets optimized, or whatever, the hash will be 
different. Even if the cloned repo was the same.
(AFAIK there is no standard defining how well gzip should compress, nor 
does gzip guarantee for a given input an equivalent output between 
different future versions of gzip)
So in fact the hash on a GIT repo in BR compares the zip tool I used to 
create the hash file and the tool that the BR user has installed on his 
machine.
And that is surely not what you want to do, is it?

For GIT the SHA1 value together with "git fsck" seem to do the job. See 
the answer in this post:
https://stackoverflow.com/questions/31550828/verify-git-integrity

Regards
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10 10:31         ` Petr Kulhavy
@ 2017-09-10 18:18           ` Yann E. MORIN
  2017-09-10 18:40             ` Thomas Petazzoni
  2017-09-10 23:30             ` Petr Kulhavy
  0 siblings, 2 replies; 11+ messages in thread
From: Yann E. MORIN @ 2017-09-10 18:18 UTC (permalink / raw)
  To: buildroot

Petr, all,

On 2017-09-10 12:31 +0200, Petr Kulhavy spake thusly:
> On 10/09/17 11:24, Yann E. MORIN wrote:
> >On 2017-09-10 08:04 +0200, Thomas Petazzoni spake thusly:
> >>On Sat, 9 Sep 2017 22:53:06 +0200, Petr Kulhavy wrote:
> >>>Is there a command to just clone and compress the repo via BR?
> >>>The <package>-extract make target fails if the hash doesn't exist and
> >>>consequently deletes the temporary files.
> >>Yeah, it's a bit annoying. If you put a none hash temporarily, then you
> >>can have the tarball downloaded, calculate its hash, and add it. We
> >>also had proposals like https://patchwork.ozlabs.org/patch/791357/ to
> >>help with this.
> >IIRC, I was opposed to that change, because we want the user to go and
> >get the hash as provided by upstream (e.g. in a release email).
> >
> >Having the infra pre-calculate the hash locally defeats the very purpose
> >of the hashes: check that what we got is what upstream provides.
> Doesn't the idea of a hash of a cloned and zipped GIT repo go a little bit
> against this?
> I mean, I have never seen any upstream providing a hash for a specific clone
> of a repo.

Indeed no. This is one of the cases where a locally computed hash is needed.

> In fact, that is what the GIT hash provides, in a slightly different form.

Except when one git-clones a tag; unlike a sha1, a tag can be changed;
see below.

> So I must say I'm bit missing the point of providing a hash for cloned and
> zipped GIT repo.
> What is the hash trying to protect?

Globally, the hash is here for three reasons:

 1- be sure that what we download is what we expect, to avoid
    man-in-the-middle attacks, especially on security-sensitive
    packages: ca-certificates, openssh, dropbear, etc...

 2- be sure that what we download is what we expect, to avoid silent
    corruption of the downloaded blob, or to avoid fscked-up by
    intermediate CDNs (already seen!)

 3- detect when upstream completely messes up, and redoes a release,
    like regnerating a release tarball, or re-tagging another commit,
    after the previous one went public.

The last one is problmatic, becasue then we can not longer ensure
reproducibility of a build. There's nothing we can do in this case, of
course, except pster upstream to nver do that again. But at least, we
caught it and we can act accordingly; it is not a silent change of
behaviour.

> On the contrary, I even think it is a wrong approach. The zip is created
> locally after the clone. And the output, or the hash if you want, depends on
> the zip tool used and its settings (compression level, etc.).

No, it should not depend on it, because we really go at great lengths to
ensure it *is* reproduclibe; see the scripts in support/download/

> So if someone uses a tool with a different default compression level or for
> instance gzip gets optimized, or whatever, the hash will be different. Even
> if the cloned repo was the same.

And if gzip no longer produces the same output, then a lot of other
things break loose, because nothing previously existing would be
reproducible anymnore. This would make quite a fuss, to say the least.

> (AFAIK there is no standard defining how well gzip should compress, nor does
> gzip guarantee for a given input an equivalent output between different
> future versions of gzip)

Indeed there's no standard, except de-facto, for what the compression
level is: all versions have defaulted to level 6. At least all that are
applicable by today, that is that was already level 6 20 years ago, and
probably even before that (but my memory is not that trustworthy past
that mark, sorry).

Note that you still have a (very small) point: we do not enforce the
compression level when compressing the archive:
    https://git.buildroot.org/buildroot/tree/support/download/git#n104

So, do you want to send a patch that forces level 6, please?

However, yes, there *is* a standard about the gzip compression
algorithm; gzip uses the DEFLATE algorithm, which is specified in
RFC1951: https://tools.ietf.org/html/rfc1951

If gzip would compress with another algorithm, then that would no longer
be gzip. I would love to see a movie about sysadmins and developpers who
battle in a world where gzip sudenly changes its output format. Sure
worth the pop-corn. I might even go see it in 3D! ;-)

> So in fact the hash on a GIT repo in BR compares the zip tool I used to
> create the hash file and the tool that the BR user has installed on his
> machine.
> And that is surely not what you want to do, is it?

Yes it is, because it is reproducible.

> For GIT the SHA1 value together with "git fsck" seem to do the job. See the
> answer in this post:
> https://stackoverflow.com/questions/31550828/verify-git-integrity

However, we can also use a tag from a git repo, and a tag is not sufficient
to ensure the three integrity checks we need, as explained above.

So yes, we do want a hash of a tarball created by a git clone.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10 18:18           ` Yann E. MORIN
@ 2017-09-10 18:40             ` Thomas Petazzoni
  2017-09-10 23:30             ` Petr Kulhavy
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Petazzoni @ 2017-09-10 18:40 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, 10 Sep 2017 20:18:06 +0200, Yann E. MORIN wrote:

> Globally, the hash is here for three reasons:
> 
>  1- be sure that what we download is what we expect, to avoid
>     man-in-the-middle attacks, especially on security-sensitive
>     packages: ca-certificates, openssh, dropbear, etc...
> 
>  2- be sure that what we download is what we expect, to avoid silent
>     corruption of the downloaded blob, or to avoid fscked-up by
>     intermediate CDNs (already seen!)
> 
>  3- detect when upstream completely messes up, and redoes a release,
>     like regnerating a release tarball, or re-tagging another commit,
>     after the previous one went public.

I think there is also another reason for the hashes to exist: if you
fetch from a BR2_PRIMARY_SITE or from the BR2_BACKUP_SITE, you're
really fetching tarballs, and not doing git clones. So in this case,
having a hash makes a lot of sense.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10 18:18           ` Yann E. MORIN
  2017-09-10 18:40             ` Thomas Petazzoni
@ 2017-09-10 23:30             ` Petr Kulhavy
  2017-09-11 20:04               ` Yann E. MORIN
  1 sibling, 1 reply; 11+ messages in thread
From: Petr Kulhavy @ 2017-09-10 23:30 UTC (permalink / raw)
  To: buildroot

Hi Yann,

thank you for the exhaustive explanation. I can see now why the hash for 
cloned GIT repos might be needed.

I'm however not sure that I expressed clearly enough my point regarding 
the issues I see in the currect hash calculation.
There is a fundamental difference between downloading a raw file (let's 
say an archive from an FTP server) and calculating its hash, and cloning 
a GIT repo and calculating the hash in the way BR does.

*In the first case* the integrity check is done on the downloaded file. 
That is, the file is downloaded to local (during which it might be 
corrupted, or a different file is downloaded in case of MITM), then the 
checksum is calculated.
The sha256sum tool guarantees that the same input sequence of bytes 
always produces the same hash. Regardless the version, regardless the 
implementation, regardless the host machine. The SHA256 algorithm 
guarantees that.
A difference in the hash automatically means a difference in the 
downloaded file.

*In the second case* the sum is not calculated directly on the 
downloaded file(s). The files are downloaded to local (during which they 
might get corrupted, or different files are downloaded due to MITM or 
GIT repo changes).
Then they are tared and gzipped. Then the sum is calculated.
This method may produce false negatives.

So we have the chain: download -> tar -> gzip -> sha256-sum
Do tar and gzip guarantee reproducible output on identical input across 
implementations? Or is the output version/implementation specific? Let's 
look at them closely.

*Tar:**
*- guarantees to produce a POSIX interchangeable format from the input, 
as defined in POSIX 1003.1-1990
- you force the gnu header format, sort the files, use numeric owners, 
UID=GID=0, force the date to the checkout date -> these are all good, 
but still don't guarantee a reproducible output across implementations
- because the standard does not specify what type of padding should be 
used for strings (after the 0 character) and for files (the last block 
of a file). These are*implementation specific*. GNU tar seems to 
initialize them to 0.

*Gzip:
*- RFCs 1950-1952 guarantee compatibility on the file format and algorithm
- the DEFLATE algorithm however has some free space for the 
implementation to find the matching strings. This means *a compatible 
implementation might **produce* *different output.
*- in GNU gzip these tweaks are controlled by the compression level in 
gzip, which should be explicitly specified as you already realised
- GNU gzip can change its implementation in the future
- other implementation than GNU gzip might produce different output. See 
https://en.wikipedia.org/wiki/DEFLATE#Encoder.2Fcompressor
For instance the pigz based on zlib does produce a different output 
(pigz compresses slightly more even at the same level 6). Yet, you can 
perfectly compress with pigz and decompress with GNU gunzip.
See your 3D film here ;-)  https://zlib.net/pigz/

So the current hash calculation for cloned GIT repos depends on the 
tools used. Is that more clear now?


What to do then? I can see several options, with different reliability 
and practicality:
1) the 100% reliable solution is to calculate checksum of each 
individual file (raw) plus compare the file names. E.g.

|LC_ALL=C find . -type f -print0 | sort -z | xargs -r0 sha256sum | sha256sum|


2) another 100% reliable solution: bundle BR with a specific version of 
tar and gzip (or download and build them) and use the current method. 
However the same tools should be then use to create the hash file.

3) the almost 100% working solution is to remove the gzip step and 
calculate checksum of the tar. This depends just on the padding 
implementation in tar, and it is reasonable to assume zero-padding.
Just for sure the Buildroot documentation should be updated that *GNU* 
tar is required.

4) the implementation dependent solution is to use tar.gz as now, force 
the compression level, document that *GNU* gzip is required and cross 
fingers that gzip doesn't change its implementation in the future.

In any case, if specific versions of the tools are assumed (and the 
current implementation does assume them), this should be very clearly 
documented.

Regards
Petr

On 10/09/17 20:18, Yann E. MORIN wrote:
> Petr, all,
>
> On 2017-09-10 12:31 +0200, Petr Kulhavy spake thusly:
>> On 10/09/17 11:24, Yann E. MORIN wrote:
>>> On 2017-09-10 08:04 +0200, Thomas Petazzoni spake thusly:
>>>> On Sat, 9 Sep 2017 22:53:06 +0200, Petr Kulhavy wrote:
>>>>> Is there a command to just clone and compress the repo via BR?
>>>>> The <package>-extract make target fails if the hash doesn't exist and
>>>>> consequently deletes the temporary files.
>>>> Yeah, it's a bit annoying. If you put a none hash temporarily, then you
>>>> can have the tarball downloaded, calculate its hash, and add it. We
>>>> also had proposals like https://patchwork.ozlabs.org/patch/791357/ to
>>>> help with this.
>>> IIRC, I was opposed to that change, because we want the user to go and
>>> get the hash as provided by upstream (e.g. in a release email).
>>>
>>> Having the infra pre-calculate the hash locally defeats the very purpose
>>> of the hashes: check that what we got is what upstream provides.
>> Doesn't the idea of a hash of a cloned and zipped GIT repo go a little bit
>> against this?
>> I mean, I have never seen any upstream providing a hash for a specific clone
>> of a repo.
> Indeed no. This is one of the cases where a locally computed hash is needed.
>
>> In fact, that is what the GIT hash provides, in a slightly different form.
> Except when one git-clones a tag; unlike a sha1, a tag can be changed;
> see below.
>
>> So I must say I'm bit missing the point of providing a hash for cloned and
>> zipped GIT repo.
>> What is the hash trying to protect?
> Globally, the hash is here for three reasons:
>
>   1- be sure that what we download is what we expect, to avoid
>      man-in-the-middle attacks, especially on security-sensitive
>      packages: ca-certificates, openssh, dropbear, etc...
>
>   2- be sure that what we download is what we expect, to avoid silent
>      corruption of the downloaded blob, or to avoid fscked-up by
>      intermediate CDNs (already seen!)
>
>   3- detect when upstream completely messes up, and redoes a release,
>      like regnerating a release tarball, or re-tagging another commit,
>      after the previous one went public.
>
> The last one is problmatic, becasue then we can not longer ensure
> reproducibility of a build. There's nothing we can do in this case, of
> course, except pster upstream to nver do that again. But at least, we
> caught it and we can act accordingly; it is not a silent change of
> behaviour.
>
>> On the contrary, I even think it is a wrong approach. The zip is created
>> locally after the clone. And the output, or the hash if you want, depends on
>> the zip tool used and its settings (compression level, etc.).
> No, it should not depend on it, because we really go at great lengths to
> ensure it *is* reproduclibe; see the scripts in support/download/
>
>> So if someone uses a tool with a different default compression level or for
>> instance gzip gets optimized, or whatever, the hash will be different. Even
>> if the cloned repo was the same.
> And if gzip no longer produces the same output, then a lot of other
> things break loose, because nothing previously existing would be
> reproducible anymnore. This would make quite a fuss, to say the least.
>
>> (AFAIK there is no standard defining how well gzip should compress, nor does
>> gzip guarantee for a given input an equivalent output between different
>> future versions of gzip)
> Indeed there's no standard, except de-facto, for what the compression
> level is: all versions have defaulted to level 6. At least all that are
> applicable by today, that is that was already level 6 20 years ago, and
> probably even before that (but my memory is not that trustworthy past
> that mark, sorry).
>
> Note that you still have a (very small) point: we do not enforce the
> compression level when compressing the archive:
>      https://git.buildroot.org/buildroot/tree/support/download/git#n104
>
> So, do you want to send a patch that forces level 6, please?
>
> However, yes, there *is* a standard about the gzip compression
> algorithm; gzip uses the DEFLATE algorithm, which is specified in
> RFC1951: https://tools.ietf.org/html/rfc1951
>
> If gzip would compress with another algorithm, then that would no longer
> be gzip. I would love to see a movie about sysadmins and developpers who
> battle in a world where gzip sudenly changes its output format. Sure
> worth the pop-corn. I might even go see it in 3D! ;-)
>
>> So in fact the hash on a GIT repo in BR compares the zip tool I used to
>> create the hash file and the tool that the BR user has installed on his
>> machine.
>> And that is surely not what you want to do, is it?
> Yes it is, because it is reproducible.
>
>> For GIT the SHA1 value together with "git fsck" seem to do the job. See the
>> answer in this post:
>> https://stackoverflow.com/questions/31550828/verify-git-integrity
> However, we can also use a tag from a git repo, and a tag is not sufficient
> to ensure the three integrity checks we need, as explained above.
>
> So yes, we do want a hash of a tarball created by a git clone.
>
> Regards,
> Yann E. MORIN.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.busybox.net/pipermail/buildroot/attachments/20170911/1c0f9b29/attachment.html>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version
  2017-09-10 23:30             ` Petr Kulhavy
@ 2017-09-11 20:04               ` Yann E. MORIN
  0 siblings, 0 replies; 11+ messages in thread
From: Yann E. MORIN @ 2017-09-11 20:04 UTC (permalink / raw)
  To: buildroot

Petr, All,

[note: please wrap your mails to ~72-80 chars]

On 2017-09-11 01:30 +0200, Petr Kulhavy spake thusly:
[--SNIP--]
> In the second case the sum is not calculated directly on the downloaded
> file(s). The files are downloaded to local (during which they might get
> corrupted, or different files are downloaded due to MITM or GIT repo
> changes).

As Thomas also pointed out, the tarballs may also come from a primary
site or a backup site.

For example, we have such a backup mirro that is publicly available:
    http://sources.buildroot.org/

In this case, we also want to ensure that the archive that is downlaoded
from there is what the user would have had if he did the clone.

> Then they are tared and gzipped. Then the sum is calculated.
> This method may produce false negatives.

Well, theoretically, that's true.

> So we have the chain: download -> tar -> gzip -> sha256-sum
> Do tar and gzip guarantee reproducible output on identical input
> across implementations? Or is the output version/implementation
> specific? Let's look at them closely.
> 
> Tar:
> - guarantees to produce a POSIX interchangeable format from the
>   input, as defined in POSIX 1003.1-1990
> - you force the gnu header format, sort the files, use numeric
>   owners, UID=GID=0, force the date to the checkout date -> these
>   are all good, but still don't guarantee a reproducible output
>   across implementations
> - because the standard does not specify what type of padding should
>   be used for strings (after the 0 character) and for files (the
>   last block of a file). These are implementation specific . GNU
>   tar seems to initialize them to 0.
> 
> Gzip:
> - RFCs 1950-1952 guarantee compatibility on the file format and
>   algorithm
> - the DEFLATE algorithm however has some free space for the
>   implementation to find the matching strings. This means a compatible
>   implementation might produce different output.
> - in GNU gzip these tweaks are controlled by the compression level in
>   gzip, which should be explicitly specified as you already realised
> - GNU gzip can change its implementation in the future
> - other implementation than GNU gzip might produce different output.
>   See [1]https://en.wikipedia.org/wiki/DEFLATE#Encoder.2Fcompressor

OK, so you did a more thourough research than I did! ;-)

> For instance the pigz based on zlib does produce a different output
> (pigz compresses slightly more even at the same level 6). Yet, you
> can perfectly compress with pigz and decompress with GNU gunzip.
> See your 3D film here ;-)? [2]https://zlib.net/pigz/

Damned! ;-)

> So the current hash calculation for cloned GIT repos depends on the
> tools used. Is that more clear now?

Yes ,I see that this is morfe complex than I originally thought...

> What to do then? I can see several options, with different reliability
> and practicality:

> 1) the 100% reliable solution is to calculate checksum of each
> individual file (raw) plus compare the file names. E.g.
> LC_ALL=C find . -type f -print0 | sort -z | xargs -r0 sha256sum | sha256sum

Nope.

> 2) another 100% reliable solution: bundle BR with a specific version
>    of tar and gzip (or download and build them) and use the current
>    method. However the same tools should be then use to create the
>    hash file.

Nope as well.

> 3) the almost 100% working solution is to remove the gzip step and
> calculate checksum of the tar. This depends just on the padding
> implementation in tar, and it is reasonable to assume zero-padding.
> Just for sure the Buildroot documentation should be updated that GNU
> tar is required.

That one is not 100%, so not an improvement over the existign one.

> 4) the implementation dependent solution is to use tar.gz as now,
> force the compression level, document that GNU gzip is required
> and cross fingers that gzip doesn't change its implementation in
> the future.

I think this is a safe bet, yes. I've built almost all gzip versions
available from upstream, and they all generate the exact same output.
The first was released in 1993, and the latest in 2016:

    $ gzip-${version} -n -6 <foo >foo.gzip-${version}.gz

and the result is:

    $ sha1sum *.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.2.4a.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.2.4.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.3.13.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.5.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.6.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.7.gz
    acdbbb3dfed79a24caeaf22a4c0201033ebe363b  foo.gzip-1.8.gz

So, clearly gzip does have a very stable output.

(note, 1.3.x, 1.3.12 and 1.4 were not tested, because they fail to build
on my machine).)

And we do use gzip, not any other variant, so I would say that we do
stick with the current state (exceot firce the compression level,
maybe).

Now, for tar, that was a bit more complex, because the versions older
than 1.27 do not build, or crash because of overflows. But for 1.27
(released in 2013) and later, the output is also reproducible:

    $ tar-${version} cf - \
         --numeric-owner --owner=0 --group=0 \
         --mtime=1970-01-01T00:00:00Z --format=gnu \
         -T foo.sorted >foo.tar-${version}.tar

and the result is:

    $ sha1sum *.tar
    378fd66d420af1ea18d58f4dece3e7a15588bbcf  foo.tar-1.27.1.tar
    378fd66d420af1ea18d58f4dece3e7a15588bbcf  foo.tar-1.27.tar
    378fd66d420af1ea18d58f4dece3e7a15588bbcf  foo.tar-1.28.tar
    378fd66d420af1ea18d58f4dece3e7a15588bbcf  foo.tar-1.29.tar

Again, pretty stable...

> In any case, if specific versions of the tools are assumed (and the
> current implementation does assume them), this should be very clearly
> documented.

Agreed. But it is not needed, as shown above: gzip is *very* *very*
stable in the output it generates; tar looks like it is also really
stable.

So, even though this is technically possible, I have a lot of doubt
that his would ever happen, at least not in the forseeable future.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-11 20:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-09 17:17 [Buildroot] [PATCH 1/1] linuxptp: bump to the latest version Petr Kulhavy
2017-09-09 20:08 ` Thomas Petazzoni
2017-09-09 20:53   ` Petr Kulhavy
2017-09-10  6:04     ` Thomas Petazzoni
2017-09-10  9:24       ` Yann E. MORIN
2017-09-10 10:31         ` Petr Kulhavy
2017-09-10 18:18           ` Yann E. MORIN
2017-09-10 18:40             ` Thomas Petazzoni
2017-09-10 23:30             ` Petr Kulhavy
2017-09-11 20:04               ` Yann E. MORIN
2017-09-10  9:57       ` Petr Kulhavy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox