git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "João Victor Bonfim" <JoaoVictorBonfim@protonmail.com>
Cc: Martin Fick <mfick@codeaurora.org>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: Fw: Curiosity
Date: Sat, 18 Dec 2021 01:34:11 +0000	[thread overview]
Message-ID: <Yb06k5ob+bl/oE68@camp.crustytoothpaste.net> (raw)
In-Reply-To: <1X3gQ48NK5aBDHcpYMlxESRjqubcCBKJUQu2K0dBOnTyvsXCXXoGDBg2Ff4KarK6WsZnzN3HgqHGOlCKKdF-wtZQ5tHsoAcfit2CTXMWqh4=@protonmail.com>

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

On 2021-12-18 at 00:15:59, João Victor Bonfim wrote:
> > I suspect that for most algorithms and their implementations, this would
> >
> > not result in repeatable "recompressed" results. Thus the checked-out
> >
> > files might be different every time you checked them out. :(
> 
> How or why?
> 
> Sincere question.

A lossless compression algorithm has to produce an encoded value that,
when decoded, must produce the original input.  Ideally, it will also
reduce the file size of the original input.  Beyond that, there's a
great deal of freedom to implement that.

Just taking Deflate, which is used in zlib and gzip, as an example,
there are different compression settings that control the size of the
window to use that affect compression speed, quality of compression
(resulting size), and memory usage.  One might prefer using gzip -1 to
get better performance or use less memory, or gzip -9 to reduce the file
size as much as possible.

Even when the same settings are used, the technique used can vary
between versions of the software.  For example, GitHub effectively uses
git archive to generate archives, and one time when they upgraded their
servers, the compression changed in the tarballs and zip files, and
everybody who was relying on the archives being bit-for-bit identical[0]
had a problem.

So it would be nearly impossible to produce bit-for-bit repeatable
results without specifying a specific, hard-coded implementation, and
even in that case, the behavior might need to change for security
reasons, so it would end up being difficult to achieve.

[0] Neither Git nor GitHub provides this guarantee, so please do not
make this mistake.  If you need a fixed bit-for-bit tarball, save it as
a release artifact.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  parent reply	other threads:[~2021-12-18  1:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Wlh_w2gSCDQ2ieJnIY7TStWrzxbwP98SNRIFMTYpva7SRFipqk63HEYFVF7wFn1oSHOkQNsjWGOa5L49vyRlvSLbuZqpmvOaDOHmFkdt2zw=@protonmail.com>
2021-12-15  3:52 ` Fw: Curiosity João Victor Bonfim
2021-12-15 18:07   ` Junio C Hamano
2021-12-15 23:45     ` João Victor Bonfim
2021-12-16  2:19     ` brian m. carlson
2021-12-16 21:20       ` João Victor Bonfim
2021-12-16 21:33         ` Martin Fick
2021-12-16 21:42           ` Junio C Hamano
2021-12-18  0:17             ` João Victor Bonfim
2021-12-18  0:15           ` João Victor Bonfim
2021-12-18  0:24             ` Junio C Hamano
2021-12-18  0:50               ` João Victor Bonfim
2021-12-18  1:06             ` Martin Fick
2021-12-18  1:34             ` brian m. carlson [this message]
2021-12-18  1:40               ` João Victor Bonfim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yb06k5ob+bl/oE68@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=JoaoVictorBonfim@protonmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).