git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Management of opendocument (openoffice.org) files in git
@ 2008-09-15 22:40 Sergio Callegari
  2008-09-16  6:45 ` Matthieu Moy
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Sergio Callegari @ 2008-09-15 22:40 UTC (permalink / raw)
  To: git

Hi,

Management of opendocument files in git has been discussed a short time ago.
Here is an helper script that may help achieving better density in git packs
containg blobs from openoffice files.

To try it, save the following as "rezip" with execution permission:

-----8<----------------------- 

#! /bin/bash
#
# (c) 2008 Sergio Callegari
#
# Rewrites a zip archive, possibly changing the compression level

USAGE='Usage: rezip [options] [file]
with options:
  [-h | --help]            Gives help
  [-p ?]                   Lists known profiles
  [--unzip_opts options]   Pass options to unzip helper to read zip file
  [--zip_opts options]     Pass options to zip helper to write zip file
  [-p | --profile profile] Get options for helpers from profile

Rewrites a zip archive, possibily changing the compression level.
If the archive name is unspecified, then the command operates like a filter,
reading from standard input and writing to standard output.
Options can be manually provided to the unzip process doing the read and to
the zip process doing the write. Alternatively a profile can be used to set
options automatically.'

PROFILES="ODF_UNCOMPRESS ODF_COMPRESS"

PROFILE_UNZIP_ODF_UNCOMPRESS='-b -qq -X'
PROFILE_ZIP_ODF_UNCOMPRESS='-q -r -D -0'
PROFILE_UNZIP_ODF_COMPRESS='-b -qq -X'
PROFILE_ZIP_ODF_COMPRESS='-q -r -D -6'

die()
{
    echo "$1" >&$2
    exit $3
}

UNZIP_OPTS=""
ZIP_OPTS=""

while true ; do
    case "$1" in
        -h | --help)
            die "$USAGE" 1 0 ;;
        -p | --profile)
            if [ "$2" = "?" ] ; then
                die "Avalilable profiles: ${PROFILES}" 1 0 ;
            else
                profile=$2
                shift
                profile_unzip=PROFILE_UNZIP_${profile}
                profile_zip=PROFILE_ZIP_${profile}
                UNZIP_OPTS=${!profile_unzip}
                ZIP_OPTS=${!profile_zip}
            fi ;;
        --unzip_opts)
            UNZIP_OPTS=${UNZIP_OPTS} $2
            shift ;;
        --zip_opts)
            ZIP_OPTS=${ZIP_OPTS} $2
            shift ;;
        -*)
            die "$USAGE" 2 1 ;;
        *)
            break ;;
    esac
    shift
done

if [ $# = 0 ] ; then
    tmpcopy=$(mktemp rezip.zip.XXXXXX)
    cat > $tmpcopy
    filename="$tmpcopy"
else
    tmpcopy=""
    filename="$1"
fi

workdir=$(mktemp -d -t rezip.workdir.XXXXXX)
curdir=$(pwd)

cd $workdir
unzip $UNZIP_OPTS "$curdir/$filename"
zip $ZIP_OPTS "$curdir/$filename" .
cd $curdir
rm -fr $workdir
if [ ! -z "$tmpcopy" ] ; then
  cat $filename
  rm $tmpcopy
fi

--------8<------------------------

then put in your .git/config something like

[filter "opendocument"]
        clean = "rezip -p ODF_UNCOMPRESS"
        smudge = "rezip -p ODF_COMPRESS"

and finally set gitattributes as

*.odt filter=opendocument
*.ods filter=opendocument
*.odp filter=opendocument

Note:
   with this you might experience some delay on operations like
git status
git add
git commit -a
git checkout

depending on the size of the opendocument files being tracked.

Before using on anything sensitive, please test that it does what it should.

The script should probably be made more robust against unexpected situations.

Hope it can be useful to someone.

Sergio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
@ 2008-09-16  6:24 Paolo Bonzini
  2008-09-16  7:05 ` Sergio Callegari
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2008-09-16  6:24 UTC (permalink / raw)
  To: Sergio Callegari, Git Mailing List

>                 profile_unzip=PROFILE_UNZIP_${profile}
>                 profile_zip=PROFILE_ZIP_${profile}
>                 UNZIP_OPTS=${!profile_unzip}
>                 ZIP_OPTS=${!profile_zip}

Can be written (in pure bourne shell) as

  eval UNZIP_OPTS=\$PROFILE_UNZIP_${profile}
  eval ZIP_OPTS=\$PROFILE_ZIP_${profile}

>         --unzip_opts)
>             UNZIP_OPTS=${UNZIP_OPTS} $2

Missing quotes:

  UNZIP_OPTS="${UNZIP_OPTS} $2"

It could also be a good idea to do

  UNZIP_OPTS="${UNZIP_OPTS} `echo $2 | sed 'y/,/ /' `"

(compare with the -Wa/-Wl/-Wp options of gcc) so you can do

  [filter "opendocument"]
        clean = "rezip --unzip-opts -b,-qq,-X --zip-opts -q,-r,-D,-0"
        smudge = "rezip --unzip-opts -b,-qq,-X --zip-opts -q,-r,-D,-6"

And maybe -b,-qq,-X and -q,-r respectively could be added by default?

Anyway, nice script, thanks!

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-15 22:40 Sergio Callegari
@ 2008-09-16  6:45 ` Matthieu Moy
  2008-09-16  7:41   ` Sergio Callegari
  2008-09-16  7:09 ` Johannes Sixt
  2008-09-23 11:08 ` Peter Krefting
  2 siblings, 1 reply; 15+ messages in thread
From: Matthieu Moy @ 2008-09-16  6:45 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git

Sergio Callegari <sergio.callegari@gmail.com> writes:

> Hi,
>
> Management of opendocument files in git has been discussed a short time ago.
> Here is an helper script that may help achieving better density in git packs
> containg blobs from openoffice files.

If you don't get "oh, sh*t, I lost data with it"-kind of feedback, can
you add it to the wiki:

http://git.or.cz/gitwiki/GitTips#head-1cdd4ab777e74f12d1ffa7f0a793e46dd06e5945

Thanks,

-- 
Matthieu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
@ 2008-09-16  7:05 ` Sergio Callegari
  2008-09-16  8:12   ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Sergio Callegari @ 2008-09-16  7:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Git Mailing List

Thanks for the fixes (particularly the missing quotation!) and the 
suggestions.

With regards to

> And maybe -b,-qq,-X and -q,-r respectively could be added by default?
>
>   
I would prefer not to do so:  if you do you get something that is 
somehow "specialised", otherwise you have a totally generic "rezipper" 
that might also find other applications (who knows).
BTW, that is why I added the profiles, so that there was no need to type 
repetitive stuff.

Sergio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-15 22:40 Sergio Callegari
  2008-09-16  6:45 ` Matthieu Moy
@ 2008-09-16  7:09 ` Johannes Sixt
  2008-09-16  7:41   ` Sergio Callegari
  2008-09-23 11:08 ` Peter Krefting
  2 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2008-09-16  7:09 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git

Sergio Callegari schrieb:
> if [ $# = 0 ] ; then
>     tmpcopy=$(mktemp rezip.zip.XXXXXX)
>     cat > $tmpcopy
>     filename="$tmpcopy"
> else
>     tmpcopy=""
>     filename="$1"
> fi
> 
> workdir=$(mktemp -d -t rezip.workdir.XXXXXX)
> curdir=$(pwd)
> 
> cd $workdir
> unzip $UNZIP_OPTS "$curdir/$filename"
> zip $ZIP_OPTS "$curdir/$filename" .
> cd $curdir
> rm -fr $workdir
> if [ ! -z "$tmpcopy" ] ; then
>   cat $filename
>   rm $tmpcopy
> fi

You don't need a temporay zip filename in filter mode:

  unzip $UNZIP_OPTS /dev/stdin  # works for me, but not 100% portable
  zip $ZIP_OPTS - .             # writes to stdout

> then put in your .git/config something like
> 
> [filter "opendocument"]
>         clean = "rezip -p ODF_UNCOMPRESS"
>         smudge = "rezip -p ODF_COMPRESS"

Is the smudge filter really necessary? Can't OOo work with files at
compression level 0?

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  7:09 ` Johannes Sixt
@ 2008-09-16  7:41   ` Sergio Callegari
  2008-09-16  7:52     ` Johannes Sixt
  2008-09-16 16:04     ` Avery Pennarun
  0 siblings, 2 replies; 15+ messages in thread
From: Sergio Callegari @ 2008-09-16  7:41 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt wrote:
>
> You don't need a temporay zip filename in filter mode:
>
>   unzip $UNZIP_OPTS /dev/stdin  # works for me, but not 100% portable
>   zip $ZIP_OPTS - .             # writes to stdout
>
>   
The unzip documentation says "Archives read from standard input are not 
yet supported", so I was a bit worried about using the /dev/stdin 
thing.  Might it be that there are subtle cases where unzip needs to 
seek or rewind?
>> then put in your .git/config something like
>>
>> [filter "opendocument"]
>>         clean = "rezip -p ODF_UNCOMPRESS"
>>         smudge = "rezip -p ODF_COMPRESS"
>>     
>
> Is the smudge filter really necessary? Can't OOo work with files at
> compression level 0?
>   
Yes, you can live perfectly without smudge.  But at times it is not that 
nice. Just think of finding a directory with say 15 lectures as impress 
slides taking 10 times the space it needs, particularly if you need to 
pass those files to someone else. As a matter of fact, ODF xml is very 
verbose and compresses particularly well having long tags.
But you might want to compress -1 rather than the default in smudge to 
speed it up a little.  Can be done either adding a new profile to the 
script (say ODF_COMPRESS_FAST) or by adding --zip_opts -1 to the smudge 
command line.
Also, we might want to add some -n suffixes to zip, to prevent it from 
trying to compress a few things like .png or .jpeg images and that have 
their own compression.  That should gain us some speed in smudging.

In any case - but this is just my feeling - it is much more disturbing 
the delay that the clean filter introduces in operations like add or 
status or commit, than the one introduced by the (slower) smudge filter 
in checkout.  There must be some psychological reason for that.  
Possibly we are "programmed" to accept waiting when we need to get 
something and conversely we are impatient when someone should accept 
something from us.

Sergio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  6:45 ` Matthieu Moy
@ 2008-09-16  7:41   ` Sergio Callegari
  0 siblings, 0 replies; 15+ messages in thread
From: Sergio Callegari @ 2008-09-16  7:41 UTC (permalink / raw)
  Cc: git

Matthieu Moy wrote:
> Sergio Callegari <sergio.callegari@gmail.com> writes:
>
>   
>> Hi,
>>
>> Management of opendocument files in git has been discussed a short time ago.
>> Here is an helper script that may help achieving better density in git packs
>> containg blobs from openoffice files.
>>     
>
> If you don't get "oh, sh*t, I lost data with it"-kind of feedback, can
> you add it to the wiki:
>
> http://git.or.cz/gitwiki/GitTips#head-1cdd4ab777e74f12d1ffa7f0a793e46dd06e5945
>
> Thanks,
>
>   
Sure.  I'll wait a few days for feedback (also from myself), then I'll
add it there.
I've already got a couple of corrections and suggestions from Paolo.
Would it be useful also to add a note about how to filter-branches with
a plain "--tree-filter true" to convert archives so that they take
advantage of storing ODF stuff uncompressed?
If proper, I can add that too.


Sergio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  7:41   ` Sergio Callegari
@ 2008-09-16  7:52     ` Johannes Sixt
  2008-09-16 16:04     ` Avery Pennarun
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Sixt @ 2008-09-16  7:52 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git

Sergio Callegari schrieb:
> Johannes Sixt wrote:
>>
>> You don't need a temporay zip filename in filter mode:
>>
>>   unzip $UNZIP_OPTS /dev/stdin  # works for me, but not 100% portable
>>   zip $ZIP_OPTS - .             # writes to stdout
>>
>>   
> The unzip documentation says "Archives read from standard input are not
> yet supported", so I was a bit worried about using the /dev/stdin
> thing.  Might it be that there are subtle cases where unzip needs to
> seek or rewind?

I didn't test thoroughly nor did I read the documentation. So if the
documentation says stdin is a no-go, you better do what it says. ;)

> In any case - but this is just my feeling - it is much more disturbing
> the delay that the clean filter introduces in operations like add or
> status or commit, than the one introduced by the (slower) smudge filter
> in checkout.

My feeling is that the temporary tree that is written slows it down. If
rezip were a true filter it could be faster.

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  7:05 ` Sergio Callegari
@ 2008-09-16  8:12   ` Paolo Bonzini
  2008-10-02 12:52     ` Michael J Gruber
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2008-09-16  8:12 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: Git Mailing List


> With regards to
> 
>> And maybe -b,-qq,-X and -q,-r respectively could be added by default?
>>
>>   
> I would prefer not to do so:  if you do you get something that is
> somehow "specialised", otherwise you have a totally generic "rezipper"
> that might also find other applications (who knows).

Yeah, but regarding -b/-X, their effect can/should be undone with zip
command line options (-l, -ll, -X).  And for zip's -r option, a rezipper
that by default only rezips the top directory does not seem very useful. :-)

You're right about letting the user specify -qq/-q.  Or maybe you can
have a -q/--quiet option to rezip that adds both -qq to unzip, and -q to
zip.  This way you can use the openoffice profile both in quiet mode
(for git) and in non-quiet mode (for manual use).

Putting all of this together, it would make the filter look like this:

  [filter "opendocument"]
        clean = "rezip --quiet --zip-opts -D,-0"
        smudge = "rezip --quiet --zip-opts -D,-6"

or similarly with profiles:

  [filter "opendocument"]
        clean = "rezip --quiet -p ODF_UNCOMPRESS"
        smudge = "rezip --quiet -p ODF_COMPRESS"

After my signature you can find my attempt at making rezip more useful
as a general program.  It supports --quiet, multiple input files, and -
for stdin.  (And I learnt from you that >&$foo works).

> BTW, that is why I added the profiles, so that there was no need to type
> repetitive stuff.

Understood.

Paolo

-----8<-----------------------
#! /bin/sh
#
# (c) 2008 Sergio Callegari
#
# Rewrites a zip archive, possibly changing the compression level

USAGE='Usage: rezip OPTIONS FILE...
with options:
  -h, --help               Gives help
  --unzip-opts OPTIONS     Pass options to unzip helper to read zip file
  --zip-opts OPTIONS       Pass options to zip helper to write zip file
  -p, --profile PROFILE    Get options for helpers from profile
  -q, --quiet              Make unzip and zip quiet

Rewrites a zip archive, possibily changing the compression level.
If the archive name is unspecified or "-", then the command operates
like a filter, reading from standard input and writing to standard
output.  Options (either space- or comma-separated) can be manually
provided to the unzip process doing the read and to the zip process
doing the write.  Alternatively a profile can be used to set options
automatically.'

PROFILES="ODF_UNCOMPRESS ODF_COMPRESS"

PROFILE_UNZIP_ODF_UNCOMPRESS=
PROFILE_ZIP_ODF_UNCOMPRESS=-D,-0
PROFILE_UNZIP_ODF_COMPRESS=
PROFILE_ZIP_ODF_COMPRESS=-D,-6

die()
{
    echo "$3$USAGE

Available profiles: ${PROFILES}" >&$1
    exit $2
}

UNZIP_OPTS=""
ZIP_OPTS=""

while true ; do
    case "$1" in
        -h | --help)
            die 1 0 ;;

        # TODO: handle -p*, --profile=* and similarly for other options

        -p | --profile)
            eval UNZIP_OPTS=\$PROFILE_UNZIP_$2
            eval ZIP_OPTS=\$PROFILE_ZIP_$2
            shift ;;
        --unzip-opts)
            UNZIP_OPTS="${UNZIP_OPTS} $2"
            shift ;;
        --zip-opts)
            ZIP_OPTS="${ZIP_OPTS} $2"
            shift ;;
        -q | --quiet)
            UNZIP_QUIET=-qq
            ZIP_QUIET=-q ;;
        -*)
            die 2 1 "Invalid option: $1
" ;;
        *)
            break ;;
    esac
    shift
done

UNZIP_OPTS="$UNZIP_QUIET -b -X `echo $UNZIP_OPTS | sed 'y/,/ /'`"
ZIP_OPTS="$ZIP_QUIET -r `echo $ZIP_OPTS | sed 'y/,/ /'`"

if [ $# = 0 ] ; then
    set fnord -
    shift
fi

redir=1
for filename
do
    if [ "$filename" = - ]; then
        redir=2
        break
    fi
done

for filename
do
    workdir=`mktemp -d -t rezip.workdir.XXXXXX`

    if [ "$filename" = - ]; then
        tmpcopy=:
        filename=`mktemp rezip.zip.XXXXXX`
        cat > $filename
    else
        tmpcopy=false
    fi

    (case $filename in
        /*) ;;
        *) filename=`pwd`/$filename ;;
    esac
    cd "$workdir"
    unzip $UNZIP_OPTS "$filename" >&$redir
    zip $ZIP_OPTS "$filename" . >&$redir)
    rm -fr "$workdir"

    if $tmpcopy ; then
        cat "$filename"
        rm "$filename"
    fi
done

--------8<------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  7:41   ` Sergio Callegari
  2008-09-16  7:52     ` Johannes Sixt
@ 2008-09-16 16:04     ` Avery Pennarun
  2008-09-16 19:28       ` Stephen R. van den Berg
  2008-09-16 21:13       ` Robin Rosenberg
  1 sibling, 2 replies; 15+ messages in thread
From: Avery Pennarun @ 2008-09-16 16:04 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: Johannes Sixt, git

On Tue, Sep 16, 2008 at 3:41 AM, Sergio Callegari
<sergio.callegari@gmail.com> wrote:
> Johannes Sixt wrote:
>>
>> You don't need a temporay zip filename in filter mode:
>>
>>  unzip $UNZIP_OPTS /dev/stdin  # works for me, but not 100% portable
>>  zip $ZIP_OPTS - .             # writes to stdout
>>
>>
>
> The unzip documentation says "Archives read from standard input are not yet
> supported", so I was a bit worried about using the /dev/stdin thing.  Might
> it be that there are subtle cases where unzip needs to seek or rewind?

IIRC zip files keep their index at the end of the file, which means
zipping in a pipeline is efficient (you can write all the blocks
first, then drop the final index at the end) but unzipping that way is
really hard.

unzipping from /dev/stdin seems to work if stdin is seekable, otherwise not.

       unzip /dev/stdin <filename.zip    # works
       cat filename.zip | unzip /dev/stdin    # doesn't work

Have fun,

Avery

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16 16:04     ` Avery Pennarun
@ 2008-09-16 19:28       ` Stephen R. van den Berg
  2008-09-16 21:13       ` Robin Rosenberg
  1 sibling, 0 replies; 15+ messages in thread
From: Stephen R. van den Berg @ 2008-09-16 19:28 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Sergio Callegari, Johannes Sixt, git

Avery Pennarun wrote:
>On Tue, Sep 16, 2008 at 3:41 AM, Sergio Callegari
><sergio.callegari@gmail.com> wrote:
>> Johannes Sixt wrote:
>IIRC zip files keep their index at the end of the file, which means
>zipping in a pipeline is efficient (you can write all the blocks
>first, then drop the final index at the end) but unzipping that way is
>really hard.

Well, the index *is* at the end, yes, however, almost all (if not all) the
information in the index is present directly in front of the files as
well, so unzipping from stdin is possible without seeks (though the
standard unzip doesn't support that (yet) because it tries to verify
integrity and speed up lists using the index at the end).
-- 
Sincerely,
           Stephen R. van den Berg.

Human beings were created by water to transport it uphill.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16 16:04     ` Avery Pennarun
  2008-09-16 19:28       ` Stephen R. van den Berg
@ 2008-09-16 21:13       ` Robin Rosenberg
  1 sibling, 0 replies; 15+ messages in thread
From: Robin Rosenberg @ 2008-09-16 21:13 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Sergio Callegari, Johannes Sixt, git

tisdagen den 16 september 2008 18.04.44 skrev Avery Pennarun:
> unzipping from /dev/stdin seems to work if stdin is seekable, otherwise not.
> 
>        unzip /dev/stdin <filename.zip    # works
>        cat filename.zip | unzip /dev/stdin    # doesn't work

Try a cousin of zip for extraction:

	cat filename.zip | jar x # works

> Have fun,
Always.

-- robin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-15 22:40 Sergio Callegari
  2008-09-16  6:45 ` Matthieu Moy
  2008-09-16  7:09 ` Johannes Sixt
@ 2008-09-23 11:08 ` Peter Krefting
  2 siblings, 0 replies; 15+ messages in thread
From: Peter Krefting @ 2008-09-23 11:08 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git

Sergio Callegari:

> To try it, save the following as "rezip" with execution permission:

I had some problems when I tried to implement this a Windows machine,
it did not handle paths with spaces in them properly, and "Documents
and Settings" does contain spaces.

The following patch fixes that for me:
---
 rezip |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/rezip b/rezip
index 15f83a4..845e875 100755
--- a/rezip
+++ b/rezip
@@ -66,7 +66,7 @@ done
 
 if [ $# = 0 ] ; then
     tmpcopy=$(mktemp rezip.zip.XXXXXX)
-    cat > $tmpcopy
+    cat > "$tmpcopy"
     filename="$tmpcopy"
 else
     tmpcopy=""
@@ -76,12 +76,12 @@ fi
 workdir=$(mktemp -d -t rezip.workdir.XXXXXX)
 curdir=$(pwd)
 
-cd $workdir
+cd "$workdir"
 unzip $UNZIP_OPTS "$curdir/$filename"
 zip $ZIP_OPTS "$curdir/$filename" .
-cd $curdir
-rm -fr $workdir
+cd "$curdir"
+rm -fr "$workdir"
 if [ ! -z "$tmpcopy" ] ; then
-  cat $filename
-  rm $tmpcopy
+  cat "$filename"
+  rm "$tmpcopy"
 fi
-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-09-16  8:12   ` Paolo Bonzini
@ 2008-10-02 12:52     ` Michael J Gruber
  2008-10-10  8:12       ` Peter Krefting
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2008-10-02 12:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Sergio Callegari, Git Mailing List

Following up on the discussion about tracking oo files I conducted a
minimalistic test. I simulated tracking an oo spreadsheat, where from
one version to the next only a few cells would be entered in an existing
spreadsheet. These are the sizes of the individual files:

48K     0.ods
48K     1.ods
60K     2.ods
60K     3.ods
56K     4.ods
64K     5.ods
68K     6.ods
64K     7.ods
64K     8.ods
68K     9.ods
600K    total

I then tracked this in three different ways, each in a fresh repo:

"packed": copy $i.ods to t.ods as is, git add t.ods and commit.
"unpacked": use the unzipped contents of $i.ods instead.
"rezip": use the rezipped version (compression 0, using Sergio's script).
"oofilter": use clean/smudge filters (calling Sergio's rezip)

Here are the resulting sizes: first ".git/objects" as is, then after
repacking -adf, finally the total size of .git + the work tree (i.e. the
last revision).

packed
708K    .git/objects
492K    .git/objects
692K    .git + wt

unpacked
1,3M    .git/objects
144K    .git/objects
1,5M    .git + wt

rezip
992K    .git/objects
148K    .git/objects
1,4M    .git + wt

oofilter
984K    .git/objects
148K    .git/objects
352K    .git + wt

Unsurprisingly, the total size is dominated by the work tree size if you
 have few revisions. (Also, templates and such contribute.)
Note that git log --stat will report the sizes of packed files in the
first case, but the sizes of unpacked files in all other cases. In
particular, it reports a different size for the  HEAD revision than you
have in a HEAD checkout.

I tried rewriting "packed" after configuring the filters: filter-branch
refuses to work with a dirty work-tree, even after "checkout -f HEAD"
and "reset --hard". It seems that git status is permanently confused
here. (Has anyone successfully rewritten existing oo files?)

I'm not sure about the lessons, but I wanted to share the numbers
anyways. I think this (your script and its usage) is heading in a useful
direction and should maybe made more known, if not made easier from the
git side. Also I'm still looking for a good (deterministic) pdf
recompressor.

Michael

git version 1.6.0.2.426.g2cfa6

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Management of opendocument (openoffice.org) files in git
  2008-10-02 12:52     ` Michael J Gruber
@ 2008-10-10  8:12       ` Peter Krefting
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Krefting @ 2008-10-10  8:12 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Git Mailing List

Michael J Gruber:

> I'm not sure about the lessons, but I wanted to share the numbers
> anyways. I think this (your script and its usage) is heading in a useful
> direction and should maybe made more known, if not made easier from the
> git side.

I had very positive experiences with the script for my use-case. I did
post them to the list, but it seems as if they got lost. At least I
can't seem to find them, did they show up?

I had some problems with the script when trying to run it under
Windows, though. Running Windows-Git from a Cygwin prompt provides some
confusion about some of the Unix tools' behaviour that I needed to
work around (like removing "/cygdrive" prefixes).

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-10-10  8:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-16  6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
2008-09-16  7:05 ` Sergio Callegari
2008-09-16  8:12   ` Paolo Bonzini
2008-10-02 12:52     ` Michael J Gruber
2008-10-10  8:12       ` Peter Krefting
  -- strict thread matches above, loose matches on Subject: below --
2008-09-15 22:40 Sergio Callegari
2008-09-16  6:45 ` Matthieu Moy
2008-09-16  7:41   ` Sergio Callegari
2008-09-16  7:09 ` Johannes Sixt
2008-09-16  7:41   ` Sergio Callegari
2008-09-16  7:52     ` Johannes Sixt
2008-09-16 16:04     ` Avery Pennarun
2008-09-16 19:28       ` Stephen R. van den Berg
2008-09-16 21:13       ` Robin Rosenberg
2008-09-23 11:08 ` Peter Krefting

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).