* Management of opendocument (openoffice.org) files in git
@ 2008-09-15 22:40 Sergio Callegari
2008-09-16 6:45 ` Matthieu Moy
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Sergio Callegari @ 2008-09-15 22:40 UTC (permalink / raw)
To: git
Hi,
Management of opendocument files in git has been discussed a short time ago.
Here is an helper script that may help achieving better density in git packs
containg blobs from openoffice files.
To try it, save the following as "rezip" with execution permission:
-----8<-----------------------
#! /bin/bash
#
# (c) 2008 Sergio Callegari
#
# Rewrites a zip archive, possibly changing the compression level
USAGE='Usage: rezip [options] [file]
with options:
[-h | --help] Gives help
[-p ?] Lists known profiles
[--unzip_opts options] Pass options to unzip helper to read zip file
[--zip_opts options] Pass options to zip helper to write zip file
[-p | --profile profile] Get options for helpers from profile
Rewrites a zip archive, possibily changing the compression level.
If the archive name is unspecified, then the command operates like a filter,
reading from standard input and writing to standard output.
Options can be manually provided to the unzip process doing the read and to
the zip process doing the write. Alternatively a profile can be used to set
options automatically.'
PROFILES="ODF_UNCOMPRESS ODF_COMPRESS"
PROFILE_UNZIP_ODF_UNCOMPRESS='-b -qq -X'
PROFILE_ZIP_ODF_UNCOMPRESS='-q -r -D -0'
PROFILE_UNZIP_ODF_COMPRESS='-b -qq -X'
PROFILE_ZIP_ODF_COMPRESS='-q -r -D -6'
die()
{
echo "$1" >&$2
exit $3
}
UNZIP_OPTS=""
ZIP_OPTS=""
while true ; do
case "$1" in
-h | --help)
die "$USAGE" 1 0 ;;
-p | --profile)
if [ "$2" = "?" ] ; then
die "Avalilable profiles: ${PROFILES}" 1 0 ;
else
profile=$2
shift
profile_unzip=PROFILE_UNZIP_${profile}
profile_zip=PROFILE_ZIP_${profile}
UNZIP_OPTS=${!profile_unzip}
ZIP_OPTS=${!profile_zip}
fi ;;
--unzip_opts)
UNZIP_OPTS=${UNZIP_OPTS} $2
shift ;;
--zip_opts)
ZIP_OPTS=${ZIP_OPTS} $2
shift ;;
-*)
die "$USAGE" 2 1 ;;
*)
break ;;
esac
shift
done
if [ $# = 0 ] ; then
tmpcopy=$(mktemp rezip.zip.XXXXXX)
cat > $tmpcopy
filename="$tmpcopy"
else
tmpcopy=""
filename="$1"
fi
workdir=$(mktemp -d -t rezip.workdir.XXXXXX)
curdir=$(pwd)
cd $workdir
unzip $UNZIP_OPTS "$curdir/$filename"
zip $ZIP_OPTS "$curdir/$filename" .
cd $curdir
rm -fr $workdir
if [ ! -z "$tmpcopy" ] ; then
cat $filename
rm $tmpcopy
fi
--------8<------------------------
then put in your .git/config something like
[filter "opendocument"]
clean = "rezip -p ODF_UNCOMPRESS"
smudge = "rezip -p ODF_COMPRESS"
and finally set gitattributes as
*.odt filter=opendocument
*.ods filter=opendocument
*.odp filter=opendocument
Note:
with this you might experience some delay on operations like
git status
git add
git commit -a
git checkout
depending on the size of the opendocument files being tracked.
Before using on anything sensitive, please test that it does what it should.
The script should probably be made more robust against unexpected situations.
Hope it can be useful to someone.
Sergio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
@ 2008-09-16 6:24 Paolo Bonzini
2008-09-16 7:05 ` Sergio Callegari
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2008-09-16 6:24 UTC (permalink / raw)
To: Sergio Callegari, Git Mailing List
> profile_unzip=PROFILE_UNZIP_${profile}
> profile_zip=PROFILE_ZIP_${profile}
> UNZIP_OPTS=${!profile_unzip}
> ZIP_OPTS=${!profile_zip}
Can be written (in pure bourne shell) as
eval UNZIP_OPTS=\$PROFILE_UNZIP_${profile}
eval ZIP_OPTS=\$PROFILE_ZIP_${profile}
> --unzip_opts)
> UNZIP_OPTS=${UNZIP_OPTS} $2
Missing quotes:
UNZIP_OPTS="${UNZIP_OPTS} $2"
It could also be a good idea to do
UNZIP_OPTS="${UNZIP_OPTS} `echo $2 | sed 'y/,/ /' `"
(compare with the -Wa/-Wl/-Wp options of gcc) so you can do
[filter "opendocument"]
clean = "rezip --unzip-opts -b,-qq,-X --zip-opts -q,-r,-D,-0"
smudge = "rezip --unzip-opts -b,-qq,-X --zip-opts -q,-r,-D,-6"
And maybe -b,-qq,-X and -q,-r respectively could be added by default?
Anyway, nice script, thanks!
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-15 22:40 Sergio Callegari
@ 2008-09-16 6:45 ` Matthieu Moy
2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:09 ` Johannes Sixt
2008-09-23 11:08 ` Peter Krefting
2 siblings, 1 reply; 15+ messages in thread
From: Matthieu Moy @ 2008-09-16 6:45 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
Sergio Callegari <sergio.callegari@gmail.com> writes:
> Hi,
>
> Management of opendocument files in git has been discussed a short time ago.
> Here is an helper script that may help achieving better density in git packs
> containg blobs from openoffice files.
If you don't get "oh, sh*t, I lost data with it"-kind of feedback, can
you add it to the wiki:
http://git.or.cz/gitwiki/GitTips#head-1cdd4ab777e74f12d1ffa7f0a793e46dd06e5945
Thanks,
--
Matthieu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
@ 2008-09-16 7:05 ` Sergio Callegari
2008-09-16 8:12 ` Paolo Bonzini
0 siblings, 1 reply; 15+ messages in thread
From: Sergio Callegari @ 2008-09-16 7:05 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Git Mailing List
Thanks for the fixes (particularly the missing quotation!) and the
suggestions.
With regards to
> And maybe -b,-qq,-X and -q,-r respectively could be added by default?
>
>
I would prefer not to do so: if you do you get something that is
somehow "specialised", otherwise you have a totally generic "rezipper"
that might also find other applications (who knows).
BTW, that is why I added the profiles, so that there was no need to type
repetitive stuff.
Sergio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-15 22:40 Sergio Callegari
2008-09-16 6:45 ` Matthieu Moy
@ 2008-09-16 7:09 ` Johannes Sixt
2008-09-16 7:41 ` Sergio Callegari
2008-09-23 11:08 ` Peter Krefting
2 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2008-09-16 7:09 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
Sergio Callegari schrieb:
> if [ $# = 0 ] ; then
> tmpcopy=$(mktemp rezip.zip.XXXXXX)
> cat > $tmpcopy
> filename="$tmpcopy"
> else
> tmpcopy=""
> filename="$1"
> fi
>
> workdir=$(mktemp -d -t rezip.workdir.XXXXXX)
> curdir=$(pwd)
>
> cd $workdir
> unzip $UNZIP_OPTS "$curdir/$filename"
> zip $ZIP_OPTS "$curdir/$filename" .
> cd $curdir
> rm -fr $workdir
> if [ ! -z "$tmpcopy" ] ; then
> cat $filename
> rm $tmpcopy
> fi
You don't need a temporay zip filename in filter mode:
unzip $UNZIP_OPTS /dev/stdin # works for me, but not 100% portable
zip $ZIP_OPTS - . # writes to stdout
> then put in your .git/config something like
>
> [filter "opendocument"]
> clean = "rezip -p ODF_UNCOMPRESS"
> smudge = "rezip -p ODF_COMPRESS"
Is the smudge filter really necessary? Can't OOo work with files at
compression level 0?
-- Hannes
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 7:09 ` Johannes Sixt
@ 2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:52 ` Johannes Sixt
2008-09-16 16:04 ` Avery Pennarun
0 siblings, 2 replies; 15+ messages in thread
From: Sergio Callegari @ 2008-09-16 7:41 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
Johannes Sixt wrote:
>
> You don't need a temporay zip filename in filter mode:
>
> unzip $UNZIP_OPTS /dev/stdin # works for me, but not 100% portable
> zip $ZIP_OPTS - . # writes to stdout
>
>
The unzip documentation says "Archives read from standard input are not
yet supported", so I was a bit worried about using the /dev/stdin
thing. Might it be that there are subtle cases where unzip needs to
seek or rewind?
>> then put in your .git/config something like
>>
>> [filter "opendocument"]
>> clean = "rezip -p ODF_UNCOMPRESS"
>> smudge = "rezip -p ODF_COMPRESS"
>>
>
> Is the smudge filter really necessary? Can't OOo work with files at
> compression level 0?
>
Yes, you can live perfectly without smudge. But at times it is not that
nice. Just think of finding a directory with say 15 lectures as impress
slides taking 10 times the space it needs, particularly if you need to
pass those files to someone else. As a matter of fact, ODF xml is very
verbose and compresses particularly well having long tags.
But you might want to compress -1 rather than the default in smudge to
speed it up a little. Can be done either adding a new profile to the
script (say ODF_COMPRESS_FAST) or by adding --zip_opts -1 to the smudge
command line.
Also, we might want to add some -n suffixes to zip, to prevent it from
trying to compress a few things like .png or .jpeg images and that have
their own compression. That should gain us some speed in smudging.
In any case - but this is just my feeling - it is much more disturbing
the delay that the clean filter introduces in operations like add or
status or commit, than the one introduced by the (slower) smudge filter
in checkout. There must be some psychological reason for that.
Possibly we are "programmed" to accept waiting when we need to get
something and conversely we are impatient when someone should accept
something from us.
Sergio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 6:45 ` Matthieu Moy
@ 2008-09-16 7:41 ` Sergio Callegari
0 siblings, 0 replies; 15+ messages in thread
From: Sergio Callegari @ 2008-09-16 7:41 UTC (permalink / raw)
Cc: git
Matthieu Moy wrote:
> Sergio Callegari <sergio.callegari@gmail.com> writes:
>
>
>> Hi,
>>
>> Management of opendocument files in git has been discussed a short time ago.
>> Here is an helper script that may help achieving better density in git packs
>> containg blobs from openoffice files.
>>
>
> If you don't get "oh, sh*t, I lost data with it"-kind of feedback, can
> you add it to the wiki:
>
> http://git.or.cz/gitwiki/GitTips#head-1cdd4ab777e74f12d1ffa7f0a793e46dd06e5945
>
> Thanks,
>
>
Sure. I'll wait a few days for feedback (also from myself), then I'll
add it there.
I've already got a couple of corrections and suggestions from Paolo.
Would it be useful also to add a note about how to filter-branches with
a plain "--tree-filter true" to convert archives so that they take
advantage of storing ODF stuff uncompressed?
If proper, I can add that too.
Sergio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 7:41 ` Sergio Callegari
@ 2008-09-16 7:52 ` Johannes Sixt
2008-09-16 16:04 ` Avery Pennarun
1 sibling, 0 replies; 15+ messages in thread
From: Johannes Sixt @ 2008-09-16 7:52 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
Sergio Callegari schrieb:
> Johannes Sixt wrote:
>>
>> You don't need a temporay zip filename in filter mode:
>>
>> unzip $UNZIP_OPTS /dev/stdin # works for me, but not 100% portable
>> zip $ZIP_OPTS - . # writes to stdout
>>
>>
> The unzip documentation says "Archives read from standard input are not
> yet supported", so I was a bit worried about using the /dev/stdin
> thing. Might it be that there are subtle cases where unzip needs to
> seek or rewind?
I didn't test thoroughly nor did I read the documentation. So if the
documentation says stdin is a no-go, you better do what it says. ;)
> In any case - but this is just my feeling - it is much more disturbing
> the delay that the clean filter introduces in operations like add or
> status or commit, than the one introduced by the (slower) smudge filter
> in checkout.
My feeling is that the temporary tree that is written slows it down. If
rezip were a true filter it could be faster.
-- Hannes
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 7:05 ` Sergio Callegari
@ 2008-09-16 8:12 ` Paolo Bonzini
2008-10-02 12:52 ` Michael J Gruber
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2008-09-16 8:12 UTC (permalink / raw)
To: Sergio Callegari; +Cc: Git Mailing List
> With regards to
>
>> And maybe -b,-qq,-X and -q,-r respectively could be added by default?
>>
>>
> I would prefer not to do so: if you do you get something that is
> somehow "specialised", otherwise you have a totally generic "rezipper"
> that might also find other applications (who knows).
Yeah, but regarding -b/-X, their effect can/should be undone with zip
command line options (-l, -ll, -X). And for zip's -r option, a rezipper
that by default only rezips the top directory does not seem very useful. :-)
You're right about letting the user specify -qq/-q. Or maybe you can
have a -q/--quiet option to rezip that adds both -qq to unzip, and -q to
zip. This way you can use the openoffice profile both in quiet mode
(for git) and in non-quiet mode (for manual use).
Putting all of this together, it would make the filter look like this:
[filter "opendocument"]
clean = "rezip --quiet --zip-opts -D,-0"
smudge = "rezip --quiet --zip-opts -D,-6"
or similarly with profiles:
[filter "opendocument"]
clean = "rezip --quiet -p ODF_UNCOMPRESS"
smudge = "rezip --quiet -p ODF_COMPRESS"
After my signature you can find my attempt at making rezip more useful
as a general program. It supports --quiet, multiple input files, and -
for stdin. (And I learnt from you that >&$foo works).
> BTW, that is why I added the profiles, so that there was no need to type
> repetitive stuff.
Understood.
Paolo
-----8<-----------------------
#! /bin/sh
#
# (c) 2008 Sergio Callegari
#
# Rewrites a zip archive, possibly changing the compression level
USAGE='Usage: rezip OPTIONS FILE...
with options:
-h, --help Gives help
--unzip-opts OPTIONS Pass options to unzip helper to read zip file
--zip-opts OPTIONS Pass options to zip helper to write zip file
-p, --profile PROFILE Get options for helpers from profile
-q, --quiet Make unzip and zip quiet
Rewrites a zip archive, possibily changing the compression level.
If the archive name is unspecified or "-", then the command operates
like a filter, reading from standard input and writing to standard
output. Options (either space- or comma-separated) can be manually
provided to the unzip process doing the read and to the zip process
doing the write. Alternatively a profile can be used to set options
automatically.'
PROFILES="ODF_UNCOMPRESS ODF_COMPRESS"
PROFILE_UNZIP_ODF_UNCOMPRESS=
PROFILE_ZIP_ODF_UNCOMPRESS=-D,-0
PROFILE_UNZIP_ODF_COMPRESS=
PROFILE_ZIP_ODF_COMPRESS=-D,-6
die()
{
echo "$3$USAGE
Available profiles: ${PROFILES}" >&$1
exit $2
}
UNZIP_OPTS=""
ZIP_OPTS=""
while true ; do
case "$1" in
-h | --help)
die 1 0 ;;
# TODO: handle -p*, --profile=* and similarly for other options
-p | --profile)
eval UNZIP_OPTS=\$PROFILE_UNZIP_$2
eval ZIP_OPTS=\$PROFILE_ZIP_$2
shift ;;
--unzip-opts)
UNZIP_OPTS="${UNZIP_OPTS} $2"
shift ;;
--zip-opts)
ZIP_OPTS="${ZIP_OPTS} $2"
shift ;;
-q | --quiet)
UNZIP_QUIET=-qq
ZIP_QUIET=-q ;;
-*)
die 2 1 "Invalid option: $1
" ;;
*)
break ;;
esac
shift
done
UNZIP_OPTS="$UNZIP_QUIET -b -X `echo $UNZIP_OPTS | sed 'y/,/ /'`"
ZIP_OPTS="$ZIP_QUIET -r `echo $ZIP_OPTS | sed 'y/,/ /'`"
if [ $# = 0 ] ; then
set fnord -
shift
fi
redir=1
for filename
do
if [ "$filename" = - ]; then
redir=2
break
fi
done
for filename
do
workdir=`mktemp -d -t rezip.workdir.XXXXXX`
if [ "$filename" = - ]; then
tmpcopy=:
filename=`mktemp rezip.zip.XXXXXX`
cat > $filename
else
tmpcopy=false
fi
(case $filename in
/*) ;;
*) filename=`pwd`/$filename ;;
esac
cd "$workdir"
unzip $UNZIP_OPTS "$filename" >&$redir
zip $ZIP_OPTS "$filename" . >&$redir)
rm -fr "$workdir"
if $tmpcopy ; then
cat "$filename"
rm "$filename"
fi
done
--------8<------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:52 ` Johannes Sixt
@ 2008-09-16 16:04 ` Avery Pennarun
2008-09-16 19:28 ` Stephen R. van den Berg
2008-09-16 21:13 ` Robin Rosenberg
1 sibling, 2 replies; 15+ messages in thread
From: Avery Pennarun @ 2008-09-16 16:04 UTC (permalink / raw)
To: Sergio Callegari; +Cc: Johannes Sixt, git
On Tue, Sep 16, 2008 at 3:41 AM, Sergio Callegari
<sergio.callegari@gmail.com> wrote:
> Johannes Sixt wrote:
>>
>> You don't need a temporay zip filename in filter mode:
>>
>> unzip $UNZIP_OPTS /dev/stdin # works for me, but not 100% portable
>> zip $ZIP_OPTS - . # writes to stdout
>>
>>
>
> The unzip documentation says "Archives read from standard input are not yet
> supported", so I was a bit worried about using the /dev/stdin thing. Might
> it be that there are subtle cases where unzip needs to seek or rewind?
IIRC zip files keep their index at the end of the file, which means
zipping in a pipeline is efficient (you can write all the blocks
first, then drop the final index at the end) but unzipping that way is
really hard.
unzipping from /dev/stdin seems to work if stdin is seekable, otherwise not.
unzip /dev/stdin <filename.zip # works
cat filename.zip | unzip /dev/stdin # doesn't work
Have fun,
Avery
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 16:04 ` Avery Pennarun
@ 2008-09-16 19:28 ` Stephen R. van den Berg
2008-09-16 21:13 ` Robin Rosenberg
1 sibling, 0 replies; 15+ messages in thread
From: Stephen R. van den Berg @ 2008-09-16 19:28 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Sergio Callegari, Johannes Sixt, git
Avery Pennarun wrote:
>On Tue, Sep 16, 2008 at 3:41 AM, Sergio Callegari
><sergio.callegari@gmail.com> wrote:
>> Johannes Sixt wrote:
>IIRC zip files keep their index at the end of the file, which means
>zipping in a pipeline is efficient (you can write all the blocks
>first, then drop the final index at the end) but unzipping that way is
>really hard.
Well, the index *is* at the end, yes, however, almost all (if not all) the
information in the index is present directly in front of the files as
well, so unzipping from stdin is possible without seeks (though the
standard unzip doesn't support that (yet) because it tries to verify
integrity and speed up lists using the index at the end).
--
Sincerely,
Stephen R. van den Berg.
Human beings were created by water to transport it uphill.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 16:04 ` Avery Pennarun
2008-09-16 19:28 ` Stephen R. van den Berg
@ 2008-09-16 21:13 ` Robin Rosenberg
1 sibling, 0 replies; 15+ messages in thread
From: Robin Rosenberg @ 2008-09-16 21:13 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Sergio Callegari, Johannes Sixt, git
tisdagen den 16 september 2008 18.04.44 skrev Avery Pennarun:
> unzipping from /dev/stdin seems to work if stdin is seekable, otherwise not.
>
> unzip /dev/stdin <filename.zip # works
> cat filename.zip | unzip /dev/stdin # doesn't work
Try a cousin of zip for extraction:
cat filename.zip | jar x # works
> Have fun,
Always.
-- robin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-15 22:40 Sergio Callegari
2008-09-16 6:45 ` Matthieu Moy
2008-09-16 7:09 ` Johannes Sixt
@ 2008-09-23 11:08 ` Peter Krefting
2 siblings, 0 replies; 15+ messages in thread
From: Peter Krefting @ 2008-09-23 11:08 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
Sergio Callegari:
> To try it, save the following as "rezip" with execution permission:
I had some problems when I tried to implement this a Windows machine,
it did not handle paths with spaces in them properly, and "Documents
and Settings" does contain spaces.
The following patch fixes that for me:
---
rezip | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/rezip b/rezip
index 15f83a4..845e875 100755
--- a/rezip
+++ b/rezip
@@ -66,7 +66,7 @@ done
if [ $# = 0 ] ; then
tmpcopy=$(mktemp rezip.zip.XXXXXX)
- cat > $tmpcopy
+ cat > "$tmpcopy"
filename="$tmpcopy"
else
tmpcopy=""
@@ -76,12 +76,12 @@ fi
workdir=$(mktemp -d -t rezip.workdir.XXXXXX)
curdir=$(pwd)
-cd $workdir
+cd "$workdir"
unzip $UNZIP_OPTS "$curdir/$filename"
zip $ZIP_OPTS "$curdir/$filename" .
-cd $curdir
-rm -fr $workdir
+cd "$curdir"
+rm -fr "$workdir"
if [ ! -z "$tmpcopy" ] ; then
- cat $filename
- rm $tmpcopy
+ cat "$filename"
+ rm "$tmpcopy"
fi
--
\\// Peter - http://www.softwolves.pp.se/
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-09-16 8:12 ` Paolo Bonzini
@ 2008-10-02 12:52 ` Michael J Gruber
2008-10-10 8:12 ` Peter Krefting
0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2008-10-02 12:52 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Sergio Callegari, Git Mailing List
Following up on the discussion about tracking oo files I conducted a
minimalistic test. I simulated tracking an oo spreadsheat, where from
one version to the next only a few cells would be entered in an existing
spreadsheet. These are the sizes of the individual files:
48K 0.ods
48K 1.ods
60K 2.ods
60K 3.ods
56K 4.ods
64K 5.ods
68K 6.ods
64K 7.ods
64K 8.ods
68K 9.ods
600K total
I then tracked this in three different ways, each in a fresh repo:
"packed": copy $i.ods to t.ods as is, git add t.ods and commit.
"unpacked": use the unzipped contents of $i.ods instead.
"rezip": use the rezipped version (compression 0, using Sergio's script).
"oofilter": use clean/smudge filters (calling Sergio's rezip)
Here are the resulting sizes: first ".git/objects" as is, then after
repacking -adf, finally the total size of .git + the work tree (i.e. the
last revision).
packed
708K .git/objects
492K .git/objects
692K .git + wt
unpacked
1,3M .git/objects
144K .git/objects
1,5M .git + wt
rezip
992K .git/objects
148K .git/objects
1,4M .git + wt
oofilter
984K .git/objects
148K .git/objects
352K .git + wt
Unsurprisingly, the total size is dominated by the work tree size if you
have few revisions. (Also, templates and such contribute.)
Note that git log --stat will report the sizes of packed files in the
first case, but the sizes of unpacked files in all other cases. In
particular, it reports a different size for the HEAD revision than you
have in a HEAD checkout.
I tried rewriting "packed" after configuring the filters: filter-branch
refuses to work with a dirty work-tree, even after "checkout -f HEAD"
and "reset --hard". It seems that git status is permanently confused
here. (Has anyone successfully rewritten existing oo files?)
I'm not sure about the lessons, but I wanted to share the numbers
anyways. I think this (your script and its usage) is heading in a useful
direction and should maybe made more known, if not made easier from the
git side. Also I'm still looking for a good (deterministic) pdf
recompressor.
Michael
git version 1.6.0.2.426.g2cfa6
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Management of opendocument (openoffice.org) files in git
2008-10-02 12:52 ` Michael J Gruber
@ 2008-10-10 8:12 ` Peter Krefting
0 siblings, 0 replies; 15+ messages in thread
From: Peter Krefting @ 2008-10-10 8:12 UTC (permalink / raw)
To: Michael J Gruber; +Cc: Git Mailing List
Michael J Gruber:
> I'm not sure about the lessons, but I wanted to share the numbers
> anyways. I think this (your script and its usage) is heading in a useful
> direction and should maybe made more known, if not made easier from the
> git side.
I had very positive experiences with the script for my use-case. I did
post them to the list, but it seems as if they got lost. At least I
can't seem to find them, did they show up?
I had some problems with the script when trying to run it under
Windows, though. Running Windows-Git from a Cygwin prompt provides some
confusion about some of the Unix tools' behaviour that I needed to
work around (like removing "/cygdrive" prefixes).
--
\\// Peter - http://www.softwolves.pp.se/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-10-10 8:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-16 6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
2008-09-16 7:05 ` Sergio Callegari
2008-09-16 8:12 ` Paolo Bonzini
2008-10-02 12:52 ` Michael J Gruber
2008-10-10 8:12 ` Peter Krefting
-- strict thread matches above, loose matches on Subject: below --
2008-09-15 22:40 Sergio Callegari
2008-09-16 6:45 ` Matthieu Moy
2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:09 ` Johannes Sixt
2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:52 ` Johannes Sixt
2008-09-16 16:04 ` Avery Pennarun
2008-09-16 19:28 ` Stephen R. van den Berg
2008-09-16 21:13 ` Robin Rosenberg
2008-09-23 11:08 ` Peter Krefting
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).