* On Tracking Binary Files
[not found] <7efce40a0904140741w28da9b54ucfe4b54bf48b0844@mail.gmail.com>
@ 2009-04-14 14:42 ` Patrick Berkeley
2009-04-14 16:54 ` Johannes Schindelin
2009-04-14 19:42 ` Nicolas Pitre
0 siblings, 2 replies; 6+ messages in thread
From: Patrick Berkeley @ 2009-04-14 14:42 UTC (permalink / raw)
To: git
Does Git track the deltas on binary files?
Someone in #git mentioned that if the binaries change too much Git no
longer just stores the changes. If this is the case, what is the
breaking point where Git goes from storing the deltas to the entire
new file?
Thanks, Patrick
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: On Tracking Binary Files
2009-04-14 14:42 ` On Tracking Binary Files Patrick Berkeley
@ 2009-04-14 16:54 ` Johannes Schindelin
2009-04-14 20:05 ` Junio C Hamano
2009-04-14 19:42 ` Nicolas Pitre
1 sibling, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2009-04-14 16:54 UTC (permalink / raw)
To: Patrick Berkeley; +Cc: git
Hi,
On Tue, 14 Apr 2009, Patrick Berkeley wrote:
> Does Git track the deltas on binary files?
>
> Someone in #git mentioned that if the binaries change too much Git no
> longer just stores the changes. If this is the case, what is the
> breaking point where Git goes from storing the deltas to the entire
> new file?
Git does not store the deltas as you think it does. The deltification of
the objects is almost independent from the commmit history, i.e. we
_always_ store snapshots for most practical matters.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: On Tracking Binary Files
2009-04-14 14:42 ` On Tracking Binary Files Patrick Berkeley
2009-04-14 16:54 ` Johannes Schindelin
@ 2009-04-14 19:42 ` Nicolas Pitre
2009-04-14 19:44 ` Patrick Berkeley
1 sibling, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2009-04-14 19:42 UTC (permalink / raw)
To: Patrick Berkeley; +Cc: git
On Tue, 14 Apr 2009, Patrick Berkeley wrote:
> Does Git track the deltas on binary files?
Yes. And actually git's delta storage doesn't care at all whether a
file is text or binary.
> Someone in #git mentioned that if the binaries change too much Git no
> longer just stores the changes. If this is the case, what is the
> breaking point where Git goes from storing the deltas to the entire
> new file?
If two versions of the same file are simply too different to make delta
compression worth it, then no deltas are used. It is still possible
that a third version of the same file would produce a nice delta against
either the first or second version though, in which case that third
version will be stored as a delta. And so on.
A sophisticated set of euristics is applied to the list of objects as a
whole to determine the best delta arrangement possible. So there is no
such thing as a simple "breaking point".
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: On Tracking Binary Files
2009-04-14 19:42 ` Nicolas Pitre
@ 2009-04-14 19:44 ` Patrick Berkeley
0 siblings, 0 replies; 6+ messages in thread
From: Patrick Berkeley @ 2009-04-14 19:44 UTC (permalink / raw)
To: git
Thanks very much for the explanation .
On Tue, Apr 14, 2009 at 15:42, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 14 Apr 2009, Patrick Berkeley wrote:
>
>> Does Git track the deltas on binary files?
>
> Yes. And actually git's delta storage doesn't care at all whether a
> file is text or binary.
>
>> Someone in #git mentioned that if the binaries change too much Git no
>> longer just stores the changes. If this is the case, what is the
>> breaking point where Git goes from storing the deltas to the entire
>> new file?
>
> If two versions of the same file are simply too different to make delta
> compression worth it, then no deltas are used. It is still possible
> that a third version of the same file would produce a nice delta against
> either the first or second version though, in which case that third
> version will be stored as a delta. And so on.
>
> A sophisticated set of euristics is applied to the list of objects as a
> whole to determine the best delta arrangement possible. So there is no
> such thing as a simple "breaking point".
>
>
> Nicolas
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: On Tracking Binary Files
2009-04-14 16:54 ` Johannes Schindelin
@ 2009-04-14 20:05 ` Junio C Hamano
2009-04-14 20:27 ` Patrick Berkeley
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2009-04-14 20:05 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Patrick Berkeley, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Tue, 14 Apr 2009, Patrick Berkeley wrote:
>
>> Does Git track the deltas on binary files?
>>
>> Someone in #git mentioned that if the binaries change too much Git no
>> longer just stores the changes. If this is the case, what is the
>> breaking point where Git goes from storing the deltas to the entire
>> new file?
>
> Git does not store the deltas as you think it does. The deltification of
> the objects is almost independent from the commmit history, i.e. we
> _always_ store snapshots for most practical matters.
Always store snapshots sounds as if you are not storing delta at all. I
think I know what you meant to say, but the way you phrased it is
misleading.
Documentation/technical/pack-heuristics.txt talks about this in some
detail. A short version is:
- It does not make a difference if you are dealing with binary or text;
- The delta is not necessarily against the same path in the previous
revision, so even a new file added to the history can be stored in a
delitified form;
- When an object stored in the deltified representation is used, it would
incur more cost than using the same object in the compressed base
representation. The deltification mechanism makes a trade-off taking
this cost into account, as well as the space efficiency.
The last point may probably be not covered by pack-heuristics IRC talk
Linus had in the documentation. Basically:
- A deltified object is stored as an (compressed) xdelta against some
base object. If the best deltified representation we come up with is
larger than the result of just compressing the object without
deltification, it is not worth storing it from the space comsumption
point of view. Thus, we originally said something like "if an
attempted delta is larger than half of the object size (assuming
average 50% of compression ratio), do not use the deltified
representation, it is not worth it". We attempt to delta against many
base objects to pick the best possible delta; the number of attempt is
called the delta window.
- The base object of a deltified object could also be deltified, and you
may need to repeatedly apply delta on top of some object that is not a
delta to get to the final object. The length of this chain is called
delta depth, and obviously you would want to keep the delta depth short
to gain a reasonable runtime performance. Thus, when delitifying one
object A, we make a weighted comparison between the size of the delta
to build it out of an object of depth N and the size of the delta to
build it out of an object of depth M. A slightly larger delta that is
based on an object with a shallower delta depth is favored over a
smaller delta based on an object with a much deeper delta depth.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: On Tracking Binary Files
2009-04-14 20:05 ` Junio C Hamano
@ 2009-04-14 20:27 ` Patrick Berkeley
0 siblings, 0 replies; 6+ messages in thread
From: Patrick Berkeley @ 2009-04-14 20:27 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio,
Thanks a lot for your thorough explanation..
Patrick
On Tue, Apr 14, 2009 at 16:05, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> On Tue, 14 Apr 2009, Patrick Berkeley wrote:
>>
>>> Does Git track the deltas on binary files?
>>>
>>> Someone in #git mentioned that if the binaries change too much Git no
>>> longer just stores the changes. If this is the case, what is the
>>> breaking point where Git goes from storing the deltas to the entire
>>> new file?
>>
>> Git does not store the deltas as you think it does. The deltification of
>> the objects is almost independent from the commmit history, i.e. we
>> _always_ store snapshots for most practical matters.
>
> Always store snapshots sounds as if you are not storing delta at all. I
> think I know what you meant to say, but the way you phrased it is
> misleading.
>
> Documentation/technical/pack-heuristics.txt talks about this in some
> detail. A short version is:
>
> - It does not make a difference if you are dealing with binary or text;
>
> - The delta is not necessarily against the same path in the previous
> revision, so even a new file added to the history can be stored in a
> delitified form;
>
> - When an object stored in the deltified representation is used, it would
> incur more cost than using the same object in the compressed base
> representation. The deltification mechanism makes a trade-off taking
> this cost into account, as well as the space efficiency.
>
> The last point may probably be not covered by pack-heuristics IRC talk
> Linus had in the documentation. Basically:
>
> - A deltified object is stored as an (compressed) xdelta against some
> base object. If the best deltified representation we come up with is
> larger than the result of just compressing the object without
> deltification, it is not worth storing it from the space comsumption
> point of view. Thus, we originally said something like "if an
> attempted delta is larger than half of the object size (assuming
> average 50% of compression ratio), do not use the deltified
> representation, it is not worth it". We attempt to delta against many
> base objects to pick the best possible delta; the number of attempt is
> called the delta window.
>
> - The base object of a deltified object could also be deltified, and you
> may need to repeatedly apply delta on top of some object that is not a
> delta to get to the final object. The length of this chain is called
> delta depth, and obviously you would want to keep the delta depth short
> to gain a reasonable runtime performance. Thus, when delitifying one
> object A, we make a weighted comparison between the size of the delta
> to build it out of an object of depth N and the size of the delta to
> build it out of an object of depth M. A slightly larger delta that is
> based on an object with a shallower delta depth is favored over a
> smaller delta based on an object with a much deeper delta depth.
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-14 20:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7efce40a0904140741w28da9b54ucfe4b54bf48b0844@mail.gmail.com>
2009-04-14 14:42 ` On Tracking Binary Files Patrick Berkeley
2009-04-14 16:54 ` Johannes Schindelin
2009-04-14 20:05 ` Junio C Hamano
2009-04-14 20:27 ` Patrick Berkeley
2009-04-14 19:42 ` Nicolas Pitre
2009-04-14 19:44 ` Patrick Berkeley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).