git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Holmsand <holmsand@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] improved delta support for git
Date: Wed, 18 May 2005 10:54:54 +0200	[thread overview]
Message-ID: <d6evrk$jv2$1@sea.gmane.org> (raw)
In-Reply-To: <Pine.LNX.4.62.0505180005230.20274@localhost.localdomain>

Nicolas Pitre wrote:
> My goal is to provide the mechanism that can be used by a higher level 
> implementing the deltafication policy.  I only provided one script as an 
> example, but it turns out that you found a way to achieve better space 
> saving.  And I bet you that there is probably ways to do even better 
> with more exhaustive delta targets.  For example you could try all 
> possible combinations on an object list for each file (and let it run 
> overnight).

Well, any kind of deltafication of, say, the complete kernel history 
pretty much has to run overnight anyway :-)

> One thing I've been wondering about is whether gzipping small deltas is 
> actually a gain.  For very small files it seems that gzip is adding more 
> overhead making the compressed file actually larger.  Might be worth 
> storing some deltas uncompressed if the compressed version turns out to 
> be larger.

It's probably better to skip deltafication of very small files 
altogether. Big pain for small gain, and all that.

>>1) I limit the maximum size of any delta to 10% of the size of the new
>>version. That guarantees a big saving, as long as any delta is
>>produced.
> 
> 
> Well, any delta object smaller than its original object saves space, 
> even if it's 75% of the original size. But...

That's not true if you want to keep the delta chain length down (and 
thus performance up).

Then, the most efficient approach is to generate many deltas against the 
same base file (otherwise, you only get 50% delta files with a maximum 
delta depth of 1).

But in this case, the trick is to know when to stop deltafying against 
one base file, and start over with another. If you switch to a new 
keyframe too often, you obviously lose some potential savings. But if 
you don't switch often enough, you end up repeating the same data in too 
many delta files.

A maximum delta size of 10% turned out to be ideal for at least the "fs" 
  tree. 8% was significantly worse, as was 15%. (The ideal size depends 
on  how big the average change is: the smaller the average change, the 
smaller the max delta size should be).

> ... but then the ultimate solution is to try out all possible references 
> within a given list.  My git-deltafy-script already finds out the list 
> of objects belonging to the same file.  Maybe git-mkdelta should try 
> all combinations between them.  This way a deeper delta chain could be 
> allowed for maximum space saving.

Yeah. But then you lose the ability to do incremental deltafication, or 
deltafication on-the-fly. And it would be really, really nice to have 
git do deltas at commit time - that way you could keep the very cool 
"immutable objects" property of git, while still saving a lot of space.

> I will look at it and merge the good stuff.

Cool! Thanks!

/dan


  reply	other threads:[~2005-05-18  8:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-12  3:51 [PATCH] improved delta support for git Nicolas Pitre
2005-05-12  4:36 ` Junio C Hamano
2005-05-12 14:27   ` Chris Mason
     [not found]     ` <2cfc403205051207467755cdf@mail.gmail.com>
2005-05-12 14:47       ` Jon Seymour
2005-05-12 15:18         ` Nicolas Pitre
2005-05-12 17:16           ` Junio C Hamano
2005-05-13 11:44             ` Chris Mason
2005-05-17 18:22 ` Thomas Glanzmann
2005-05-17 19:02   ` Thomas Glanzmann
2005-05-17 19:10   ` Thomas Glanzmann
2005-05-17 21:43 ` Dan Holmsand
2005-05-18  4:32   ` Nicolas Pitre
2005-05-18  8:54     ` Dan Holmsand [this message]
2005-05-18 18:41       ` Nicolas Pitre
2005-05-18 19:32         ` Dan Holmsand
2005-05-18 15:12   ` Linus Torvalds
2005-05-18 17:15     ` Dan Holmsand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='d6evrk$jv2$1@sea.gmane.org' \
    --to=holmsand@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).