From: Dan Holmsand <holmsand@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] improved delta support for git
Date: Wed, 18 May 2005 21:32:06 +0200 [thread overview]
Message-ID: <d6g569$hln$1@sea.gmane.org> (raw)
In-Reply-To: <Pine.LNX.4.62.0505181428170.20274@localhost.localdomain>
Nicolas Pitre wrote:
> On Wed, 18 May 2005, Dan Holmsand wrote:
>>Nicolas Pitre wrote:
>>It's probably better to skip deltafication of very small files altogether. Big
>>pain for small gain, and all that.
>
> No, that's not what I mean.
>
> Suppose a large source file that may change only one line between two
> versions. The delta may therefore end up being only a few bytes long.
> Compressing a few bytes with zlib creates a _larger_ file than the
> original few bytes.
Oh, I see. You're right, of course. I doubt, however, that there are
really large gains to be made, measured in bytes; since small files tend
to be small :-) It might nevertheless be worthwhile to skip processing
of really small files, though, as there's basically no hope of gaining
anything by deltafication.
Anyway, I've also noticed that deltas compress a lot worse than regular
blobs. In particular, "complex deltas" (i.e. small changes on lines
1,3,5,9,12, etc.), compress poorly. That might explain why my simplistic
"depth-one-deltas-against-a-common-keyframe" method works comparatively
well, as that should tend to cleaner deltas from time to time (i.e.
chunk on lines 1-12 got replaced by new stuff), that ought to be easier
to compress.
>>>Well, any delta object smaller than its original object saves space, even if
>>>it's 75% of the original size. But...
>>
>>That's not true if you want to keep the delta chain length down (and thus
>>performance up).
>
> Sure. That's why I added the -d switch to mkdelta. But if you can fit
> a delta which is 75% the size of its original object size then you still
> save 25% of the space, regardless of the delta chain length.
Ok, I guess we're talking about slightly different things here. I was
talking about my simple-and-fast method of processing one new version of
an object at a time, deltafying each new version against the first
previous non-deltafied one. If you, in that scenario, allow use of up to
75% sized deltas, on average delta size will probably be something like 37%.
> In fact it seems that deltas might be significantly harder to compress.
> Therefore a test on the resulting file should probably be done as well
> to make sure we don't end up with a delta larger than the original
> object.
Yeah (see above). That's just one of the things I cheated my way out of,
by limiting delta size to 10% (I trusting that compression doesn't
increase size by 90%).
>>>... but then the ultimate solution is to try out all possible references
>>>within a given list. My git-deltafy-script already finds out the list of
>>>objects belonging to the same file. Maybe git-mkdelta should try all
>>>combinations between them. This way a deeper delta chain could be allowed
>>>for maximum space saving.
>>
>>Yeah. But then you lose the ability to do incremental deltafication, or
>>deltafication on-the-fly.
>
>
> Not at all. Nothing prevents you from making the latest revision of a
> file be the reference object and the previous revision turned into a
> delta against that latest revision, even if it was itself a reference
> object before. The only thing that must be avoided is a delta loop and
> current mkdelta code takes care of that already.
Sure. But there's some downside to modifying already existing objects.
In particular, downloads using the existing methods (both http and
rsync) won't get your new, smaller objects. If someone is pulling from a
deltafied repository often enough, and the objects of the top-most
commit are always non-deltafied, they will never see any deltas. Object
immutability is a really good thing.
And there might be some performance issues involved, if you'd like to do
deltafying at commit-time (not that I've actually tried this, or
anything)...
Thanks for your comments!
/dan
next prev parent reply other threads:[~2005-05-18 19:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-12 3:51 [PATCH] improved delta support for git Nicolas Pitre
2005-05-12 4:36 ` Junio C Hamano
2005-05-12 14:27 ` Chris Mason
[not found] ` <2cfc403205051207467755cdf@mail.gmail.com>
2005-05-12 14:47 ` Jon Seymour
2005-05-12 15:18 ` Nicolas Pitre
2005-05-12 17:16 ` Junio C Hamano
2005-05-13 11:44 ` Chris Mason
2005-05-17 18:22 ` Thomas Glanzmann
2005-05-17 19:02 ` Thomas Glanzmann
2005-05-17 19:10 ` Thomas Glanzmann
2005-05-17 21:43 ` Dan Holmsand
2005-05-18 4:32 ` Nicolas Pitre
2005-05-18 8:54 ` Dan Holmsand
2005-05-18 18:41 ` Nicolas Pitre
2005-05-18 19:32 ` Dan Holmsand [this message]
2005-05-18 15:12 ` Linus Torvalds
2005-05-18 17:15 ` Dan Holmsand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='d6g569$hln$1@sea.gmane.org' \
--to=holmsand@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).