From: Avery Pennarun <apenwarr@gmail.com>
To: "Hervé Cauwelier" <herve@itaapy.com>
Cc: git@vger.kernel.org
Subject: Re: Multiblobs
Date: Fri, 30 Apr 2010 13:32:39 -0400 [thread overview]
Message-ID: <z2p32541b131004301032jd28b4b0azbb600880f4e15871@mail.gmail.com> (raw)
In-Reply-To: <4BDA9F5C.2080808@itaapy.com>
2010/4/30 Hervé Cauwelier <herve@itaapy.com>:
> I'll obviously let the Git experts answer you, but I can answer about
> OpenDocument itself.
>
> In a presentation each slide is a <draw:page/> inside a single content.xml.
> So if you change one slide, the whole XML will serialize with a different
> SHA.
>
> And maybe you'll add style to that slide, or probably OpenOffice.org will
> generate an automatic style, so styles.xml will also change. Adding an image
> also changes manifest.xml, along with storing the image itself. OOo will
> surely record the last slide displayed when closing the application, so
> settings.xml will change too.
>
> So, all in all, for a single slide, 30 to 80 % of the Zip content may
> change.
Sure. But if you name the chunks consistently, git's delta
compression can deal with tiny changes like those very easily.
The question is whether it'll work equally well, or better, or worse,
with a one-big-file format. I think we won't know this without doing
some actual tests.
(Normally, you could assume that one-big-file is the most
space-efficient storage format, because then xdelta and gzip have the
most data to work with. But if you have a lot of *duplicated* content
inside the same file, and the distance between duplications is outside
the gzip window, you could find that more unusual methods - like the
method used by bup - results in better compression. I know this is
true for VM images, so it may be true for other things. I haven't
tested everything :))
> You may also be interested in the git-bigfiles project that was mentioned
> last week.
>
> http://caca.zoy.org/wiki/git-bigfiles
git-bigfiles is a worthwhile project. Its goal of "make life
bearable" is aiming kind of low, though. Basically they seem to be
aiming simply to make git not die horribly when given lots of large
files. This is commendable, but the resulting repo will be very space
inefficient when your large files change frequently in small ways. So
I think it doesn't solve the problem Sergio brought up.
Have fun,
Avery
next prev parent reply other threads:[~2010-04-30 17:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-28 15:12 Multiblobs Sergio Callegari
2010-04-28 18:07 ` Multiblobs Avery Pennarun
2010-04-28 19:13 ` Multiblobs Sergio Callegari
2010-04-28 21:27 ` Multiblobs Avery Pennarun
2010-04-28 23:10 ` Multiblobs Michael Witten
2010-04-28 23:26 ` Multiblobs Sergio
2010-04-29 0:44 ` Multiblobs Avery Pennarun
2010-04-29 11:34 ` Multiblobs Peter Krefting
2010-04-29 15:28 ` Multiblobs Avery Pennarun
2010-04-30 8:20 ` Multiblobs Peter Krefting
2010-04-30 17:26 ` Multiblobs Avery Pennarun
2010-04-30 9:14 ` Multiblobs Hervé Cauwelier
2010-04-30 17:32 ` Avery Pennarun [this message]
2010-04-30 18:16 ` Multiblobs Michael Witten
2010-04-30 19:06 ` Multiblobs Hervé Cauwelier
2010-04-28 18:34 ` Multiblobs Geert Bosch
2010-04-29 6:55 ` Multiblobs Mike Hommey
2010-05-06 6:26 ` Multiblobs Jeff King
2010-05-06 22:56 ` Multiblobs Sergio Callegari
2010-05-10 6:36 ` Multiblobs Jeff King
2010-05-10 13:58 ` Multiblobs Sergio Callegari
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=z2p32541b131004301032jd28b4b0azbb600880f4e15871@mail.gmail.com \
--to=apenwarr@gmail.com \
--cc=git@vger.kernel.org \
--cc=herve@itaapy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).