All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael J Gruber <git@drmicha.warpmail.net>
To: Paolo Bonzini <bonzini@gnu.org>
Cc: Sergio Callegari <sergio.callegari@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Management of opendocument (openoffice.org) files in git
Date: Thu, 02 Oct 2008 14:52:17 +0200	[thread overview]
Message-ID: <48E4C401.90409@drmicha.warpmail.net> (raw)
In-Reply-To: <48CF6A7C.4020604@gnu.org>

Following up on the discussion about tracking oo files I conducted a
minimalistic test. I simulated tracking an oo spreadsheat, where from
one version to the next only a few cells would be entered in an existing
spreadsheet. These are the sizes of the individual files:

48K     0.ods
48K     1.ods
60K     2.ods
60K     3.ods
56K     4.ods
64K     5.ods
68K     6.ods
64K     7.ods
64K     8.ods
68K     9.ods
600K    total

I then tracked this in three different ways, each in a fresh repo:

"packed": copy $i.ods to t.ods as is, git add t.ods and commit.
"unpacked": use the unzipped contents of $i.ods instead.
"rezip": use the rezipped version (compression 0, using Sergio's script).
"oofilter": use clean/smudge filters (calling Sergio's rezip)

Here are the resulting sizes: first ".git/objects" as is, then after
repacking -adf, finally the total size of .git + the work tree (i.e. the
last revision).

packed
708K    .git/objects
492K    .git/objects
692K    .git + wt

unpacked
1,3M    .git/objects
144K    .git/objects
1,5M    .git + wt

rezip
992K    .git/objects
148K    .git/objects
1,4M    .git + wt

oofilter
984K    .git/objects
148K    .git/objects
352K    .git + wt

Unsurprisingly, the total size is dominated by the work tree size if you
 have few revisions. (Also, templates and such contribute.)
Note that git log --stat will report the sizes of packed files in the
first case, but the sizes of unpacked files in all other cases. In
particular, it reports a different size for the  HEAD revision than you
have in a HEAD checkout.

I tried rewriting "packed" after configuring the filters: filter-branch
refuses to work with a dirty work-tree, even after "checkout -f HEAD"
and "reset --hard". It seems that git status is permanently confused
here. (Has anyone successfully rewritten existing oo files?)

I'm not sure about the lessons, but I wanted to share the numbers
anyways. I think this (your script and its usage) is heading in a useful
direction and should maybe made more known, if not made easier from the
git side. Also I'm still looking for a good (deterministic) pdf
recompressor.

Michael

git version 1.6.0.2.426.g2cfa6

  reply	other threads:[~2008-10-02 12:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-16  6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
2008-09-16  7:05 ` Sergio Callegari
2008-09-16  8:12   ` Paolo Bonzini
2008-10-02 12:52     ` Michael J Gruber [this message]
2008-10-10  8:12       ` Peter Krefting
  -- strict thread matches above, loose matches on Subject: below --
2008-09-15 22:40 Sergio Callegari
2008-09-16  6:45 ` Matthieu Moy
2008-09-16  7:41   ` Sergio Callegari
2008-09-16  7:09 ` Johannes Sixt
2008-09-16  7:41   ` Sergio Callegari
2008-09-16  7:52     ` Johannes Sixt
2008-09-16 16:04     ` Avery Pennarun
2008-09-16 19:28       ` Stephen R. van den Berg
2008-09-16 21:13       ` Robin Rosenberg
2008-09-23 11:08 ` Peter Krefting

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48E4C401.90409@drmicha.warpmail.net \
    --to=git@drmicha.warpmail.net \
    --cc=bonzini@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=sergio.callegari@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.