From: Patrick Berkeley <patrickberkeley@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: On Tracking Binary Files
Date: Tue, 14 Apr 2009 16:27:10 -0400 [thread overview]
Message-ID: <7efce40a0904141327w3cbfbfecwbe7d5d9125fe8d4a@mail.gmail.com> (raw)
In-Reply-To: <7vws9n2e7p.fsf@gitster.siamese.dyndns.org>
Junio,
Thanks a lot for your thorough explanation..
Patrick
On Tue, Apr 14, 2009 at 16:05, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> On Tue, 14 Apr 2009, Patrick Berkeley wrote:
>>
>>> Does Git track the deltas on binary files?
>>>
>>> Someone in #git mentioned that if the binaries change too much Git no
>>> longer just stores the changes. If this is the case, what is the
>>> breaking point where Git goes from storing the deltas to the entire
>>> new file?
>>
>> Git does not store the deltas as you think it does. The deltification of
>> the objects is almost independent from the commmit history, i.e. we
>> _always_ store snapshots for most practical matters.
>
> Always store snapshots sounds as if you are not storing delta at all. I
> think I know what you meant to say, but the way you phrased it is
> misleading.
>
> Documentation/technical/pack-heuristics.txt talks about this in some
> detail. A short version is:
>
> - It does not make a difference if you are dealing with binary or text;
>
> - The delta is not necessarily against the same path in the previous
> revision, so even a new file added to the history can be stored in a
> delitified form;
>
> - When an object stored in the deltified representation is used, it would
> incur more cost than using the same object in the compressed base
> representation. The deltification mechanism makes a trade-off taking
> this cost into account, as well as the space efficiency.
>
> The last point may probably be not covered by pack-heuristics IRC talk
> Linus had in the documentation. Basically:
>
> - A deltified object is stored as an (compressed) xdelta against some
> base object. If the best deltified representation we come up with is
> larger than the result of just compressing the object without
> deltification, it is not worth storing it from the space comsumption
> point of view. Thus, we originally said something like "if an
> attempted delta is larger than half of the object size (assuming
> average 50% of compression ratio), do not use the deltified
> representation, it is not worth it". We attempt to delta against many
> base objects to pick the best possible delta; the number of attempt is
> called the delta window.
>
> - The base object of a deltified object could also be deltified, and you
> may need to repeatedly apply delta on top of some object that is not a
> delta to get to the final object. The length of this chain is called
> delta depth, and obviously you would want to keep the delta depth short
> to gain a reasonable runtime performance. Thus, when delitifying one
> object A, we make a weighted comparison between the size of the delta
> to build it out of an object of depth N and the size of the delta to
> build it out of an object of depth M. A slightly larger delta that is
> based on an object with a shallower delta depth is favored over a
> smaller delta based on an object with a much deeper delta depth.
>
>
next prev parent reply other threads:[~2009-04-14 20:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7efce40a0904140741w28da9b54ucfe4b54bf48b0844@mail.gmail.com>
2009-04-14 14:42 ` On Tracking Binary Files Patrick Berkeley
2009-04-14 16:54 ` Johannes Schindelin
2009-04-14 20:05 ` Junio C Hamano
2009-04-14 20:27 ` Patrick Berkeley [this message]
2009-04-14 19:42 ` Nicolas Pitre
2009-04-14 19:44 ` Patrick Berkeley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7efce40a0904141327w3cbfbfecwbe7d5d9125fe8d4a@mail.gmail.com \
--to=patrickberkeley@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).