git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pierre Habouzit <madcoder@debian.org>
To: Sam Vilain <sam@vilain.net>
Cc: Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Marco Costalba <mcostalba@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Decompression speed: zip vs lzo
Date: Thu, 10 Jan 2008 10:16:07 +0100	[thread overview]
Message-ID: <20080110091607.GA17944@artemis.madism.org> (raw)
In-Reply-To: <4785A6DB.3080007@vilain.net>

[-- Attachment #1: Type: text/plain, Size: 2477 bytes --]

On Thu, Jan 10, 2008 at 07:02:39AM +0000, Sam Vilain wrote:
> Sam Vilain wrote:
> > I do really like LZOP as far as compression algorithms go.  It seems a
> > lot faster for not a huge loss in ratio.
> 
> Coincidentally, I read this today on an algorithm (LZMA - same as 7zip)
> which is very slow to compress, high ratio but quick decompression:
>
>   http://use.perl.org/~acme/journal/35330
>
> Which sounds excellent for squeezing those "archive packs" into even
> more ridiculously tiny spaces.

Well, lzma is excellent for *big* chunks of data, but not that impressive for
small files:

$ ll git.c git.c.gz git.c.lzma git.c.lzop
-rw-r--r-- 1 madcoder madcoder 12915 2008-01-09 13:47 git.c
-rw-r--r-- 1 madcoder madcoder  4225 2008-01-10 10:00 git.c.gz
-rw-r--r-- 1 madcoder madcoder  4094 2008-01-10 10:00 git.c.lzma
-rw-r--r-- 1 madcoder madcoder  5068 2008-01-10 09:59 git.c.lzop


And lzma performs really bad if you have few memory available. The "big" secret
of lzma is that it basically works with a huge window to check for repetitive
data, and even decompression needs quite a fair amount of memory, making it a
really bad choice for git IMNSHO.

Though I don't agree with you (and some others) about the fact that gzip is
fast enough. It's clearly a bottleneck in many log related commands where you
would expect it to be rather IO bound than CPU bound.  LZO seems like a fairer
choice, especially since what it makes gain is basically the compression of the
biggest blobs, aka the delta chains heads. It's really unclear to me if we
really gain in compressing the deltas, trees, and other smallish informations.

And when it comes to times, for a big file enough to give numbers, here are the
decompression times (best of 10 runs, smaller is better, second number is the
size of the packed data, original data was 7.8Mo):
  * lzma: 0.374s (2.2Mo)
  * gzip: 0.127s (2.9Mo)
  * lzop: 0.053s (3.2Mo)

For a 300k original file:
  * lzma: 0.022s (124Ko)
  * gzip: 0.008s (144Ko)
  * lzop: 0.004s (156Ko) /* most of the samples were actually 0.005 */

What is obvious to me is that lzop seems to take 10% more space than gzip,
while being around 1.5 to 2 times faster. Of course this is very sketchy and a
real test with git will be better.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2008-01-10  9:16 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 22:01 Decompression speed: zip vs lzo Marco Costalba
2008-01-09 22:55 ` Junio C Hamano
2008-01-09 23:23   ` Sam Vilain
2008-01-09 23:31     ` Johannes Schindelin
2008-01-10  1:02       ` Sam Vilain
2008-01-10  5:02         ` Sam Vilain
2008-01-10  9:16           ` Pierre Habouzit [this message]
2008-01-10 20:39             ` Nicolas Pitre
2008-01-10 21:01               ` Linus Torvalds
2008-01-10 21:30                 ` Nicolas Pitre
2008-01-11  8:57                   ` Pierre Habouzit
2008-01-10 21:45                 ` Sam Vilain
2008-01-10 22:03                   ` Linus Torvalds
2008-01-10 22:28                     ` Sam Vilain
2008-01-10 22:56                       ` Linus Torvalds
2008-01-11  1:01                         ` Sam Vilain
2008-01-11  2:10                           ` Linus Torvalds
2008-01-11  6:29                             ` Sam Vilain
2008-01-11  7:05                               ` Sam Vilain
2008-01-11 16:03                               ` Linus Torvalds
2008-01-12  1:52                                 ` Sam Vilain
2008-01-12  2:32                                   ` Nicolas Pitre
2008-01-12  3:06                                     ` Sam Vilain
2008-01-12 16:09                                       ` Nicolas Pitre
2008-01-12 16:44                                         ` Johannes Schindelin
2008-01-12  4:46                                   ` Junio C Hamano
2008-01-10 21:51               ` Marco Costalba
2008-01-10 22:01                 ` Sam Vilain
2008-01-10 22:18                 ` Nicolas Pitre
2008-01-11  9:45               ` Pierre Habouzit
2008-01-11 14:27                 ` Nicolas Pitre
2008-01-11 14:18               ` Morten Welinder
2008-01-10  3:41       ` Nicolas Pitre
2008-01-10  6:55         ` Marco Costalba
2008-01-10 11:45           ` Marco Costalba
2008-01-10 12:12             ` Johannes Schindelin
2008-01-10 12:18               ` Marco Costalba
2008-01-10 19:34           ` Dana How
2008-01-09 23:49     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080110091607.GA17944@artemis.madism.org \
    --to=madcoder@debian.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mcostalba@gmail.com \
    --cc=sam@vilain.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).