From: Dan Holmsand <holmsand@gmail.com>
To: git@vger.kernel.org
Subject: Re: RFC: adding xdelta compression to git
Date: Tue, 03 May 2005 14:48:14 +0200 [thread overview]
Message-ID: <d57rip$ojm$1@sea.gmane.org> (raw)
In-Reply-To: <Pine.LNX.4.58.0505022131380.3594@ppc970.osdl.org>
Linus Torvalds wrote:
> Also, the fact is, since git saves things as separate files, you'd not win
> as much as you would with some other backing store. So the second step is
> to start packing the objects etc. I think there is actually a very steep
> complexity edge here - not because any of the individual steps necessarily
> add a whole lot, but because they all lead to the "next step".
Actually, you can win quite a lot.
I've just been playing with storing the entire
linux-2.4.0-to-2.6.12-rc2-patchset as xdelta patches in git. The entire
thing ended up being 577M (instead of some 3.5G), according to du -sh
--apparent-size. Considering that that's some 800M of patches, that's
not too bad.
I used a very simple scheme: I stored a delta to the previous version of
every file if that delta was less than 20% in size of the new file
(otherwise, the whole file was stored as usual). If the previous version
was already in delta form, the delta was computed from that versions
"parent". I didn't even care to look at files less than 4k in size.
In other words, I didn't have to use any delta "chains", and still got
quite massive storage size gains. And this scheme could easily be used
on the fly; I'm guessing that it would be performance neutral (or even a
slight gain, since less has to be compressed and written to disk on
commits, and there might be less to read when diffing, since two
"delta-blobs" might share the same parent).
Or, with careful tuning, a repo might be "deltaified" later on (assuming
that delta blobs are addressed using the expanded files' hash).
/dan
next prev parent reply other threads:[~2005-05-03 12:44 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-03 3:57 RFC: adding xdelta compression to git Alon Ziv
2005-05-03 4:12 ` Nicolas Pitre
2005-05-03 4:52 ` Linus Torvalds
2005-05-03 5:30 ` Davide Libenzi
2005-05-03 15:52 ` C. Scott Ananian
2005-05-03 17:35 ` Linus Torvalds
2005-05-03 18:10 ` Davide Libenzi
2005-05-03 8:06 ` [PATCH] add the ability to create and retrieve delta objects Nicolas Pitre
2005-05-03 11:24 ` Chris Mason
2005-05-03 12:51 ` Nicolas Pitre
2005-05-03 15:07 ` Linus Torvalds
2005-05-03 16:09 ` Chris Mason
2005-05-03 15:57 ` C. Scott Ananian
2005-05-03 16:35 ` Chris Mason
2005-05-03 14:13 ` Chris Mason
2005-05-03 14:24 ` Nicolas Pitre
2005-05-03 14:37 ` Chris Mason
2005-05-03 15:04 ` Nicolas Pitre
2005-05-03 16:54 ` Chris Mason
2005-05-03 14:48 ` Linus Torvalds
2005-05-03 15:52 ` Nicolas Pitre
2005-05-04 15:56 ` Chris Mason
2005-05-04 16:12 ` C. Scott Ananian
2005-05-04 17:44 ` Chris Mason
2005-05-04 22:03 ` Linus Torvalds
2005-05-04 22:43 ` Chris Mason
2005-05-05 3:25 ` Nicolas Pitre
2005-05-04 21:47 ` Geert Bosch
2005-05-04 22:34 ` Chris Mason
2005-05-05 3:10 ` Nicolas Pitre
2005-05-03 12:48 ` Dan Holmsand [this message]
2005-05-03 15:50 ` RFC: adding xdelta compression to git C. Scott Ananian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='d57rip$ojm$1@sea.gmane.org' \
--to=holmsand@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.