From: Sergio <sergio.callegari@gmail.com>
To: git@vger.kernel.org
Subject: Re: Tracking OpenOffice files/other compressed files with Git
Date: Tue, 9 Sep 2008 09:02:36 +0000 (UTC) [thread overview]
Message-ID: <loom.20080909T085002-376@post.gmane.org> (raw)
In-Reply-To: 48C61F94.3060400@viscovery.net
Johannes Sixt <j.sixt <at> viscovery.net> writes:
>
> Peter Krefting schrieb:
> > Since OpenOffice doucuments are just zipped xml files, I wondered how
> > difficult it would be to create some hooks/hack git to track the files
> > inside the archives instead?
>
> You could write a "clean" filter that "recompresses" the archive with
> level 0 upon git-add.
>
A couple of notes:
1) For Openoffice documents whose size is dominated by embed images and other
large objects, the git delta mechanism already performs reasonably well, since
OO files are Zip archives where each file is compressed separately. If you do
not change an image, then that image remains stored in the same way and the
delta can be done.
2) For OO documents whose size is dominated by plain content, the git delta
mechanism cannot work, since the zip compression introduces "mixing" and a small
change in the document is converted into a very large change in the zip file.
It could be possible to write a clean filter to uncompress before commit.
However there is a trick with the complementary smudge filter to be used at
checkout. If you do not smudge properly, git always shows the file as changed
wrt the index. Smudging correctly would mean using the very same compression
ratio and compress method that OO uses, which can be a little tricky. I have
tried using the zip binary both in the clean and the smudge phases and it does
not work nicely. The smudged file is always different from the original one. One
should probably work at a lower level to have a finer control on what is
happening (libzip) and prepend to the uncompressed file the compression
parameters to be restored on smudging.
The bigger issue is however that the clean/smudge thing can be really slow when
dealing with large OO files.
next prev parent reply other threads:[~2008-09-09 9:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-09 6:19 Tracking OpenOffice files/other compressed files with Git Peter Krefting
2008-09-09 7:02 ` Johannes Sixt
2008-09-09 9:02 ` Sergio [this message]
2008-09-09 10:28 ` Michael J Gruber
2008-09-09 10:57 ` Johannes Sixt
2008-09-09 11:07 ` Sergio Callegari
2008-09-09 11:22 ` Johannes Sixt
2008-09-09 8:18 ` Mike Hommey
2008-09-09 8:34 ` Matthieu Moy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=loom.20080909T085002-376@post.gmane.org \
--to=sergio.callegari@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).