git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Berkeley <patrickberkeley@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: On Tracking Binary Files
Date: Tue, 14 Apr 2009 16:27:10 -0400	[thread overview]
Message-ID: <7efce40a0904141327w3cbfbfecwbe7d5d9125fe8d4a@mail.gmail.com> (raw)
In-Reply-To: <7vws9n2e7p.fsf@gitster.siamese.dyndns.org>

Junio,

Thanks a lot for your thorough explanation..

Patrick

On Tue, Apr 14, 2009 at 16:05, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> On Tue, 14 Apr 2009, Patrick Berkeley wrote:
>>
>>> Does Git track the deltas on binary files?
>>>
>>> Someone in #git mentioned that if the binaries change too much Git no
>>> longer just stores the changes. If this is the case, what is the
>>> breaking point where Git goes from storing the deltas to the entire
>>> new file?
>>
>> Git does not store the deltas as you think it does.  The deltification of
>> the objects is almost independent from the commmit history, i.e. we
>> _always_ store snapshots for most practical matters.
>
> Always store snapshots sounds as if you are not storing delta at all.  I
> think I know what you meant to say, but the way you phrased it is
> misleading.
>
> Documentation/technical/pack-heuristics.txt talks about this in some
> detail.  A short version is:
>
>  - It does not make a difference if you are dealing with binary or text;
>
>  - The delta is not necessarily against the same path in the previous
>   revision, so even a new file added to the history can be stored in a
>   delitified form;
>
>  - When an object stored in the deltified representation is used, it would
>   incur more cost than using the same object in the compressed base
>   representation.  The deltification mechanism makes a trade-off taking
>   this cost into account, as well as the space efficiency.
>
> The last point may probably be not covered by pack-heuristics IRC talk
> Linus had in the documentation.  Basically:
>
>  - A deltified object is stored as an (compressed) xdelta against some
>   base object.  If the best deltified representation we come up with is
>   larger than the result of just compressing the object without
>   deltification, it is not worth storing it from the space comsumption
>   point of view.  Thus, we originally said something like "if an
>   attempted delta is larger than half of the object size (assuming
>   average 50% of compression ratio), do not use the deltified
>   representation, it is not worth it".  We attempt to delta against many
>   base objects to pick the best possible delta; the number of attempt is
>   called the delta window.
>
>  - The base object of a deltified object could also be deltified, and you
>   may need to repeatedly apply delta on top of some object that is not a
>   delta to get to the final object.  The length of this chain is called
>   delta depth, and obviously you would want to keep the delta depth short
>   to gain a reasonable runtime performance.  Thus, when delitifying one
>   object A, we make a weighted comparison between the size of the delta
>   to build it out of an object of depth N and the size of the delta to
>   build it out of an object of depth M.  A slightly larger delta that is
>   based on an object with a shallower delta depth is favored over a
>   smaller delta based on an object with a much deeper delta depth.
>
>

  reply	other threads:[~2009-04-14 20:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7efce40a0904140741w28da9b54ucfe4b54bf48b0844@mail.gmail.com>
2009-04-14 14:42 ` On Tracking Binary Files Patrick Berkeley
2009-04-14 16:54   ` Johannes Schindelin
2009-04-14 20:05     ` Junio C Hamano
2009-04-14 20:27       ` Patrick Berkeley [this message]
2009-04-14 19:42   ` Nicolas Pitre
2009-04-14 19:44     ` Patrick Berkeley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7efce40a0904141327w3cbfbfecwbe7d5d9125fe8d4a@mail.gmail.com \
    --to=patrickberkeley@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).