git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Holmsand <holmsand@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] improved delta support for git
Date: Wed, 18 May 2005 21:32:06 +0200	[thread overview]
Message-ID: <d6g569$hln$1@sea.gmane.org> (raw)
In-Reply-To: <Pine.LNX.4.62.0505181428170.20274@localhost.localdomain>

Nicolas Pitre wrote:
> On Wed, 18 May 2005, Dan Holmsand wrote:
>>Nicolas Pitre wrote:
>>It's probably better to skip deltafication of very small files altogether. Big
>>pain for small gain, and all that.
> 
> No, that's not what I mean.
> 
> Suppose a large source file that may change only one line between two 
> versions.  The delta may therefore end up being only a few bytes long.  
> Compressing a few bytes with zlib creates a _larger_ file than the 
> original few bytes.

Oh, I see. You're right, of course. I doubt, however, that there are 
really large gains to be made, measured in bytes; since small files tend 
to be small :-) It might nevertheless be worthwhile to skip processing 
of really small files, though, as there's basically no hope of gaining 
anything by deltafication.

Anyway, I've also noticed that deltas compress a lot worse than regular 
blobs. In particular, "complex deltas" (i.e. small changes on lines 
1,3,5,9,12, etc.), compress poorly. That might explain why my simplistic 
"depth-one-deltas-against-a-common-keyframe" method works comparatively 
well, as that should tend to cleaner deltas from time to time (i.e. 
chunk on lines 1-12 got replaced by new stuff), that ought to be easier 
to compress.

>>>Well, any delta object smaller than its original object saves space, even if
>>>it's 75% of the original size. But...
>>
>>That's not true if you want to keep the delta chain length down (and thus
>>performance up).
> 
> Sure.  That's why I added the -d switch to mkdelta.  But if you can fit 
> a delta which is 75% the size of its original object size then you still 
> save 25% of the space, regardless of the delta chain length.

Ok, I guess we're talking about slightly different things here. I was 
talking about my simple-and-fast method of processing one new version of 
an object at a time, deltafying each new version against the first 
previous non-deltafied one. If you, in that scenario, allow use of up to 
75% sized deltas, on average delta size will probably be something like 37%.

> In fact it seems that deltas might be significantly harder to compress.  
> Therefore a test on the resulting file should probably be done as well 
> to make sure we don't end up with a delta larger than the original 
> object.

Yeah (see above). That's just one of the things I cheated my way out of, 
by limiting delta size to 10% (I trusting that compression doesn't 
increase size by 90%).

>>>... but then the ultimate solution is to try out all possible references
>>>within a given list.  My git-deltafy-script already finds out the list of
>>>objects belonging to the same file.  Maybe git-mkdelta should try all
>>>combinations between them.  This way a deeper delta chain could be allowed
>>>for maximum space saving.
>>
>>Yeah. But then you lose the ability to do incremental deltafication, or
>>deltafication on-the-fly.
> 
> 
> Not at all.  Nothing prevents you from making the latest revision of a 
> file be the reference object and the previous revision turned into a 
> delta against that latest revision, even if it was itself a reference 
> object before.  The only thing that must be avoided is a delta loop and 
> current mkdelta code takes care of that already.

Sure. But there's some downside to modifying already existing objects. 
In particular, downloads using the existing methods (both http and 
rsync) won't get your new, smaller objects. If someone is pulling from a 
deltafied repository often enough, and the objects of the top-most 
commit are always non-deltafied, they will never see any deltas. Object 
immutability is a really good thing.

And there might be some performance issues involved, if you'd like to do 
deltafying at commit-time (not that I've actually tried this, or 
anything)...

Thanks for your comments!

/dan


  reply	other threads:[~2005-05-18 19:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-12  3:51 [PATCH] improved delta support for git Nicolas Pitre
2005-05-12  4:36 ` Junio C Hamano
2005-05-12 14:27   ` Chris Mason
     [not found]     ` <2cfc403205051207467755cdf@mail.gmail.com>
2005-05-12 14:47       ` Jon Seymour
2005-05-12 15:18         ` Nicolas Pitre
2005-05-12 17:16           ` Junio C Hamano
2005-05-13 11:44             ` Chris Mason
2005-05-17 18:22 ` Thomas Glanzmann
2005-05-17 19:02   ` Thomas Glanzmann
2005-05-17 19:10   ` Thomas Glanzmann
2005-05-17 21:43 ` Dan Holmsand
2005-05-18  4:32   ` Nicolas Pitre
2005-05-18  8:54     ` Dan Holmsand
2005-05-18 18:41       ` Nicolas Pitre
2005-05-18 19:32         ` Dan Holmsand [this message]
2005-05-18 15:12   ` Linus Torvalds
2005-05-18 17:15     ` Dan Holmsand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='d6g569$hln$1@sea.gmane.org' \
    --to=holmsand@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).