From: "Dana How" <danahow@gmail.com>
To: "Marco Costalba" <mcostalba@gmail.com>
Cc: "Nicolas Pitre" <nico@cam.org>,
"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
"Sam Vilain" <sam@vilain.net>,
"Git Mailing List" <git@vger.kernel.org>,
"Junio C Hamano" <gitster@pobox.com>,
danahow@gmail.com
Subject: Re: Decompression speed: zip vs lzo
Date: Thu, 10 Jan 2008 11:34:52 -0800 [thread overview]
Message-ID: <56b7f5510801101134l334c3eabu20513b7402cd2720@mail.gmail.com> (raw)
In-Reply-To: <e5bfff550801092255wc852252m9086567a88b1ae99@mail.gmail.com>
On Jan 9, 2008 10:55 PM, Marco Costalba <mcostalba@gmail.com> wrote:
> On Jan 10, 2008 4:41 AM, Nicolas Pitre <nico@cam.org> wrote:
> > On Wed, 9 Jan 2008, Johannes Schindelin wrote:
> >
> > > I agree that gzip is already fast enough.
> > >
> > > However, pack v4 had more goodies than just being faster; it also promised
> > > to have smaller packs.
> >
> > Right, like not having to compress tree objects and half of commit
> > objects at all.
>
> Decompression speed has been shown to be a bottle neck on some tests
> involving mainly 'git log'.
Thanks for looking into this, in this email and your subsequent ones.
I agree that zip time is an issue. I was looking into reducing the _number_
of zip calls on the same data, but work and personal crises have reduced
me from an infrequent contributor to an occasional gadfly for the moment.
> Regarding back compatibility I really don't know at what level git
> functions actually need to know the compression format, looking at the
> code I would say at very low level, functions that deal directly with
> inflate() and friends are few [1] and not directly connected to UI,
> nor to git config. Is this compression format something user should
> know/care? and if yes why?
>
> In my tests the assumption of a source files tar ball is unrealistic,
> to test the final size difference I would like testing different
> compressions on a big already packaged but still not zipped file.
> Someone could be so kind to hint me on how to create such a package
> with good quality, i.e. with packaging levels similar to what is done
> for public repos?
>
> This does not realistically tests speed because as Junio pointed out
> the real decompressing schema is different: many calls on small
> objects, not one call on a big one. But if final size is acceptable we
> can go on more difficult tests.
The approach you're taking (here and in following emails) of being
able to make zip/lzo selection and measure the results should be
enlightening. For the vast majority of git users, Junio's scenario
is the most relevant.
Of additional interest to me is handling enormous objects more quickly.
I would like to replace some p4 usage here with git, but most users will
only notice the speed difference and not use git's extra features. Thus
they will compare git add/git commit/git push unfavorably to p4 edit/p4 submit,
because the former effectively does zip/unzip/zip/send, while the latter
only does zip/send (git's extra "unzip/zip" comes from loose objects not
being directly copyable into packs). This speed difference is irrelevant
for small to normal files, but a killer when commiting a collection of say
100MB files.
Your lzo option could reduce this performance degradation vs p4 from
3x to close to 1.5x. If you get it accepted, I'd love to then "fix" the loose
object copying "problem" making git _faster_ than p4 on large files!
2 simple forms for this "fix" would be to use the once-and-future "new"
loose object format (an idea already rejected), or to encode all loose
objects as singleton packs under .git/objects/xx (so that all (re)packing,
in the absence of new deltification, becomes pack-to-pack copying).
This latter idea is a modification of an idea from Nicolas Pitre.
It certainly adds less code than other approaches for such a "fix".
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
next prev parent reply other threads:[~2008-01-10 19:35 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-09 22:01 Decompression speed: zip vs lzo Marco Costalba
2008-01-09 22:55 ` Junio C Hamano
2008-01-09 23:23 ` Sam Vilain
2008-01-09 23:31 ` Johannes Schindelin
2008-01-10 1:02 ` Sam Vilain
2008-01-10 5:02 ` Sam Vilain
2008-01-10 9:16 ` Pierre Habouzit
2008-01-10 20:39 ` Nicolas Pitre
2008-01-10 21:01 ` Linus Torvalds
2008-01-10 21:30 ` Nicolas Pitre
2008-01-11 8:57 ` Pierre Habouzit
2008-01-10 21:45 ` Sam Vilain
2008-01-10 22:03 ` Linus Torvalds
2008-01-10 22:28 ` Sam Vilain
2008-01-10 22:56 ` Linus Torvalds
2008-01-11 1:01 ` Sam Vilain
2008-01-11 2:10 ` Linus Torvalds
2008-01-11 6:29 ` Sam Vilain
2008-01-11 7:05 ` Sam Vilain
2008-01-11 16:03 ` Linus Torvalds
2008-01-12 1:52 ` Sam Vilain
2008-01-12 2:32 ` Nicolas Pitre
2008-01-12 3:06 ` Sam Vilain
2008-01-12 16:09 ` Nicolas Pitre
2008-01-12 16:44 ` Johannes Schindelin
2008-01-12 4:46 ` Junio C Hamano
2008-01-10 21:51 ` Marco Costalba
2008-01-10 22:01 ` Sam Vilain
2008-01-10 22:18 ` Nicolas Pitre
2008-01-11 9:45 ` Pierre Habouzit
2008-01-11 14:27 ` Nicolas Pitre
2008-01-11 14:18 ` Morten Welinder
2008-01-10 3:41 ` Nicolas Pitre
2008-01-10 6:55 ` Marco Costalba
2008-01-10 11:45 ` Marco Costalba
2008-01-10 12:12 ` Johannes Schindelin
2008-01-10 12:18 ` Marco Costalba
2008-01-10 19:34 ` Dana How [this message]
2008-01-09 23:49 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56b7f5510801101134l334c3eabu20513b7402cd2720@mail.gmail.com \
--to=danahow@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mcostalba@gmail.com \
--cc=nico@cam.org \
--cc=sam@vilain.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).