From: Nicolas Pitre <nico@cam.org>
To: Dan Holmsand <holmsand@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] improved delta support for git
Date: Wed, 18 May 2005 00:32:51 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.62.0505180005230.20274@localhost.localdomain> (raw)
In-Reply-To: <d6dohe$dql$1@sea.gmane.org>
On Tue, 17 May 2005, Dan Holmsand wrote:
> I've been trying out your delta stuff as well. It was a bit
> disappointing at first, but some tweaking payed off in the end...
Cool!
My goal is to provide the mechanism that can be used by a higher level
implementing the deltafication policy. I only provided one script as an
example, but it turns out that you found a way to achieve better space
saving. And I bet you that there is probably ways to do even better
with more exhaustive delta targets. For example you could try all
possible combinations on an object list for each file (and let it run
overnight).
> 1) Too many deltas get too big and/or compress badly.
One thing I've been wondering about is whether gzipping small deltas is
actually a gain. For very small files it seems that gzip is adding more
overhead making the compressed file actually larger. Might be worth
storing some deltas uncompressed if the compressed version turns out to
be larger.
> 2) Trees take up a big chunk of total space.
Tree objects can be deltafied as well, but I didn't had time to script
it. A large space saving can be expected there as well, especially for
changesets that modify only a few files deep down the tree hierarchy.
> Therefore, I tried some other approaches. This one seemed to work
> best:
>
> 1) I limit the maximum size of any delta to 10% of the size of the new
> version. That guarantees a big saving, as long as any delta is
> produced.
Well, any delta object smaller than its original object saves space,
even if it's 75% of the original size. But...
> 2) If the "previous" version of a blob is a delta, I produce the new
> delta form the old deltas base version. This works surprisingly well.
> I'm guessing the reason for this is that most changes are really
> small, and they tend to be in the same area as a previous change (as
> in "Commit new feature. Commit bugfix for new feature. Commit fix for
> bugfix of new feature. Delete new feature as it doesn't work...").
... but then the ultimate solution is to try out all possible references
within a given list. My git-deltafy-script already finds out the list
of objects belonging to the same file. Maybe git-mkdelta should try
all combinations between them. This way a deeper delta chain could be
allowed for maximum space saving.
> 3) I use the same method for all tree objects.
Yup.
> Attached is a patch (against current cogito). It is basically the same
> as yours, Nicolas, except for some hackery to make the above possible.
> I'm sure I've made lots of stupid mistakes in it (and the 10% limit is
> hardcoded right now; I'm lazy).
I will look at it and merge the good stuff.
Thanks for testing!
Nicolas
next prev parent reply other threads:[~2005-05-18 4:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-12 3:51 [PATCH] improved delta support for git Nicolas Pitre
2005-05-12 4:36 ` Junio C Hamano
2005-05-12 14:27 ` Chris Mason
[not found] ` <2cfc403205051207467755cdf@mail.gmail.com>
2005-05-12 14:47 ` Jon Seymour
2005-05-12 15:18 ` Nicolas Pitre
2005-05-12 17:16 ` Junio C Hamano
2005-05-13 11:44 ` Chris Mason
2005-05-17 18:22 ` Thomas Glanzmann
2005-05-17 19:02 ` Thomas Glanzmann
2005-05-17 19:10 ` Thomas Glanzmann
2005-05-17 21:43 ` Dan Holmsand
2005-05-18 4:32 ` Nicolas Pitre [this message]
2005-05-18 8:54 ` Dan Holmsand
2005-05-18 18:41 ` Nicolas Pitre
2005-05-18 19:32 ` Dan Holmsand
2005-05-18 15:12 ` Linus Torvalds
2005-05-18 17:15 ` Dan Holmsand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.62.0505180005230.20274@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=holmsand@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).