All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pierre Habouzit <madcoder@debian.org>
To: Nicolas Pitre <nico@cam.org>
Cc: Sam Vilain <sam@vilain.net>,
	Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Marco Costalba <mcostalba@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Decompression speed: zip vs lzo
Date: Fri, 11 Jan 2008 10:45:16 +0100	[thread overview]
Message-ID: <20080111094516.GD20141@artemis.madism.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801101332150.3054@xanadu.home>

[-- Attachment #1: Type: text/plain, Size: 5671 bytes --]

On Thu, Jan 10, 2008 at 08:39:07PM +0000, Nicolas Pitre wrote:
> On Thu, 10 Jan 2008, Pierre Habouzit wrote:

> diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> index a39cb82..252b03e 100644
> --- a/builtin-pack-objects.c
> +++ b/builtin-pack-objects.c
> @@ -433,7 +433,10 @@ static unsigned long write_object(struct sha1file *f,
>  		}
>  		/* compress the data to store and put compressed length in datalen */
>  		memset(&stream, 0, sizeof(stream));
> -		deflateInit(&stream, pack_compression_level);
> +		if (obj_type == OBJ_REF_DELTA || obj_type == OBJ_OFS_DELTA)
> +			deflateInit(&stream, 0);
> +		else
> +			deflateInit(&stream, pack_compression_level);
>  		maxsize = deflateBound(&stream, size);
>  		out = xmalloc(maxsize);
>  		/* Compress it */
> 
> You then only need to run 'git repack -a -f -d' with and without the 
> above patch.

  Using as a PoC a test that is if (size <= 512) instead, I get:

vanilla git:

$ du -k .git/**/*.pack
180808 .git/objects/pack/pack-7bc9f383c92cbffe366da2d2a62b67bb33a53365.pack
$ repeat 5 time git blame MAINTAINERS >|/dev/null
git blame MAINTAINERS >| /dev/null  7,34s user 0,09s system 99% cpu 7,433 total
git blame MAINTAINERS >| /dev/null  7,31s user 0,16s system 100% cpu 7,475 total
git blame MAINTAINERS >| /dev/null  7,35s user 0,08s system 100% cpu 7,431 total
git blame MAINTAINERS >| /dev/null  7,30s user 0,18s system 99% cpu 7,482 total
git blame MAINTAINERS >| /dev/null  7,33s user 0,16s system 99% cpu 7,492 total


With a compression disabled for sizes <= 512:

$ du -k .git/**/*.pack
188840.git/objects/pack/pack-7bc9f383c92cbffe366da2d2a62b67bb33a53365.pack
$ repeat 5 time git blame MAINTAINERS >|/dev/null
git blame MAINTAINERS >| /dev/null  7,06s user 0,09s system 100% cpu 7,150 total
git blame MAINTAINERS >| /dev/null  7,08s user 0,13s system 99% cpu 7,209 total
git blame MAINTAINERS >| /dev/null  7,07s user 0,08s system 99% cpu 7,168 total
git blame MAINTAINERS >| /dev/null  7,02s user 0,15s system 99% cpu 7,177 total
git blame MAINTAINERS >| /dev/null  7,07s user 0,13s system 99% cpu 7,243 total

Okay, the size doesn't even budge, it's not even near being fun. Though
we gain 3% of wall clock time


Let's try with a limit of 1024 then !

$ du -k .git/**/*.pack
201725	.git/objects/pack/pack-7bc9f383c92cbffe366da2d2a62b67bb33a53365.pack
$ repeat 5 time git blame MAINTAINERS >|/dev/null
git blame MAINTAINERS >| /dev/null  6,93s user 0,16s system 77% cpu 9,109 total
git blame MAINTAINERS >| /dev/null  6,88s user 0,08s system 99% cpu 6,965 total
git blame MAINTAINERS >| /dev/null  6,84s user 0,10s system 99% cpu 6,952 total
git blame MAINTAINERS >| /dev/null  6,86s user 0,12s system 99% cpu 6,983 total
git blame MAINTAINERS >| /dev/null  6,81s user 0,18s system 99% cpu 6,994 total


Okay, the packs grows 10%, and the blame takes 6% less time.


Okay the numbers are still not that impressive, but my patch doesn't
touches _only_ deltas, but also log comments I said, so I've redone my
tests with git log and *TADAAAA*:

vanilla git:
    repeat 5 time git log >|/dev/null
    git log >| /dev/null  2,54s user 0,12s system 99% cpu 2,660 total
    git log >| /dev/null  2,52s user 0,12s system 99% cpu 2,653 total
    git log >| /dev/null  2,57s user 0,07s system 99% cpu 2,637 total
    git log >| /dev/null  2,56s user 0,09s system 99% cpu 2,659 total
    git log >| /dev/null  2,54s user 0,10s system 99% cpu 2,660 total

with the 512 octets limit:

    $ repeat 5 time git log >|/dev/null
    git log >| /dev/null  2,10s user 0,10s system 99% cpu 2,193 total
    git log >| /dev/null  2,08s user 0,10s system 99% cpu 2,189 total
    git log >| /dev/null  2,06s user 0,11s system 100% cpu 2,162 total
    git log >| /dev/null  2,04s user 0,13s system 100% cpu 2,172 total
    git log >| /dev/null  2,06s user 0,13s system 99% cpu 2,198 total

    That's already a 20% time reduction.


with the 1024 octets limits:
    $ repeat 5 time git log >|/dev/null
    git log >| /dev/null  1,39s user 0,12s system 99% cpu 1,512 total
    git log >| /dev/null  1,38s user 0,12s system 100% cpu 1,498 total
    git log >| /dev/null  1,41s user 0,10s system 99% cpu 1,514 total
    git log >| /dev/null  1,41s user 0,10s system 100% cpu 1,506 total
    git log >| /dev/null  1,40s user 0,10s system 100% cpu 1,504 total

    Yes that's 43% time reduction !

  As a side note, repacking with the 1024 octets limits takes 4:06 here,
and 4:26 without the limit at all, which is 8% less time. I know it
doesn't matters a lot as repack is a once time operation, but still, it
would speed up git gc --auto which is not something to neglect
completely.


I say it's worth investigating a _lot_, and the patch is that complicated:

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index a39cb82..f454929 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -433,7 +433,7 @@ static unsigned long write_object(struct sha1file *f,
                }
                /* compress the data to store and put compressed length in datalen */
                memset(&stream, 0, sizeof(stream));
-               deflateInit(&stream, pack_compression_level);
+               deflateInit(&stream, size > 1024 ? pack_compression_level : 0);
                maxsize = deflateBound(&stream, size);
                out = xmalloc(maxsize);
                /* Compress it */


-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2008-01-11  9:45 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 22:01 Decompression speed: zip vs lzo Marco Costalba
2008-01-09 22:55 ` Junio C Hamano
2008-01-09 23:23   ` Sam Vilain
2008-01-09 23:31     ` Johannes Schindelin
2008-01-10  1:02       ` Sam Vilain
2008-01-10  5:02         ` Sam Vilain
2008-01-10  9:16           ` Pierre Habouzit
2008-01-10 20:39             ` Nicolas Pitre
2008-01-10 21:01               ` Linus Torvalds
2008-01-10 21:30                 ` Nicolas Pitre
2008-01-11  8:57                   ` Pierre Habouzit
2008-01-10 21:45                 ` Sam Vilain
2008-01-10 22:03                   ` Linus Torvalds
2008-01-10 22:28                     ` Sam Vilain
2008-01-10 22:56                       ` Linus Torvalds
2008-01-11  1:01                         ` Sam Vilain
2008-01-11  2:10                           ` Linus Torvalds
2008-01-11  6:29                             ` Sam Vilain
2008-01-11  7:05                               ` Sam Vilain
2008-01-11 16:03                               ` Linus Torvalds
2008-01-12  1:52                                 ` Sam Vilain
2008-01-12  2:32                                   ` Nicolas Pitre
2008-01-12  3:06                                     ` Sam Vilain
2008-01-12 16:09                                       ` Nicolas Pitre
2008-01-12 16:44                                         ` Johannes Schindelin
2008-01-12  4:46                                   ` Junio C Hamano
2008-01-10 21:51               ` Marco Costalba
2008-01-10 22:01                 ` Sam Vilain
2008-01-10 22:18                 ` Nicolas Pitre
2008-01-11  9:45               ` Pierre Habouzit [this message]
2008-01-11 14:27                 ` Nicolas Pitre
2008-01-11 14:18               ` Morten Welinder
2008-01-10  3:41       ` Nicolas Pitre
2008-01-10  6:55         ` Marco Costalba
2008-01-10 11:45           ` Marco Costalba
2008-01-10 12:12             ` Johannes Schindelin
2008-01-10 12:18               ` Marco Costalba
2008-01-10 19:34           ` Dana How
2008-01-09 23:49     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080111094516.GD20141@artemis.madism.org \
    --to=madcoder@debian.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mcostalba@gmail.com \
    --cc=nico@cam.org \
    --cc=sam@vilain.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.