From: Sam Vilain <sam@vilain.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@cam.org>,
Pierre Habouzit <madcoder@debian.org>,
Git Mailing List <git@vger.kernel.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Marco Costalba <mcostalba@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: Decompression speed: zip vs lzo
Date: Sat, 12 Jan 2008 14:52:04 +1300 [thread overview]
Message-ID: <47881D44.9060105@vilain.net> (raw)
In-Reply-To: <alpine.LFD.1.00.0801110759160.3148@woody.linux-foundation.org>
Linus Torvalds wrote:
>
> On Fri, 11 Jan 2008, Sam Vilain wrote:
>> The difference seems only barely measurable;
>
> Ok.
>
> It may be that it might help other cases, but that seems unlikely.
>
> The more likely answer is that it's either of:
>
> - yes, zlib uncompression is noticeable in profiles, but that the
> cold-cache access is simply the bigger problem, and getting rid of zlib
> just moves the expense to whatever other thing that needs to access it
> (memcpy, xdelta apply, whatever)
>
> or
>
> - I don't know exactly which patch you used (did you just do the
> "core.deltacompression=0" thing?), and maybe zlib is fairly expensive
> even for just the setup crud, even when it doesn't really need to be.
>
> but who knows..
Well, my figures agree with Pierre I think - 6-10% time savings for
'git annotate'.
I think Pierre has hit the nail on the head - that skipping
compression for small objects is a clear win. He saw the obvious
criterion, really. I've knocked it up as a config option that doesn't
change the default behaviour below.
I can't help but speculate what benefits having a range of one or two
of the most elite compression algorithms (eg, lzop or even lzma for
the larger blobs) available would be, in general. eg, if gzip takes a
stream longer than X kb to offer substantial benefits over lzop, lzop
the ones shorter than that.
If the uncompressed objects are clustered in the pack, then they might
stream compress a lot better, should they be tranmitted over a http
transport with gzip encoding. In packs which should be as small as
possible, with a format change they could be distributed as one
compressed resource. The ordering of the objects would ideally be
selected such that it results in optimum compression - which could add
a savings akin to bzip2 vs gzip, at the expense of having to scan the
small objects for mini-deltas and arrange them clustering objects
which share these mini-deltas.
Well, interesting ideas anyway :)
Subject: [PATCH] pack-objects: add compressionMinSize option
Objects smaller than a page don't save much space when compressed, and
cause some overhead. Allow the user to specify a minimum size for
objects before they are compressed.
Credit: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz>
---
Documentation/config.txt | 5 +++++
builtin-pack-objects.c | 7 ++++++-
2 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1b6d6d6..245121e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -734,6 +734,11 @@ pack.compression::
compromise between speed and compression (currently equivalent
to level 6)."
+pack.compressionMinSize::
+ Objects smaller than this are not compressed. This can make
+ operations that deal with many small objects (such as log)
+ faster.
+
pack.deltaCacheSize::
The maximum memory in bytes used for caching deltas in
linkgit:git-pack-objects[1].
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index a39cb82..316b809 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -76,6 +76,7 @@ static int num_preferred_base;
static struct progress *progress_state;
static int pack_compression_level = Z_DEFAULT_COMPRESSION;
static int pack_compression_seen;
+static int compression_min_size = 0;
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = 0;
@@ -433,7 +434,7 @@ static unsigned long write_object(struct sha1file *f,
}
/* compress the data to store and put compressed length in datalen */
memset(&stream, 0, sizeof(stream));
- deflateInit(&stream, pack_compression_level);
+ deflateInit(&stream, size >= compression_min_size ? pack_compression_level : 0);
maxsize = deflateBound(&stream, size);
out = xmalloc(maxsize);
/* Compress it */
@@ -1841,6 +1842,10 @@ static int git_pack_config(const char *k, const char *v)
pack_compression_seen = 1;
return 0;
}
+ if (!strcmp(k, "pack.compressionminsize")) {
+ compression_min_size = git_config_int(k, v);
+ return 0;
+ }
if (!strcmp(k, "pack.deltacachesize")) {
max_delta_cache_size = git_config_int(k, v);
return 0;
--
1.5.3.7.2095.gb2448-dirty
next prev parent reply other threads:[~2008-01-12 2:11 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-09 22:01 Decompression speed: zip vs lzo Marco Costalba
2008-01-09 22:55 ` Junio C Hamano
2008-01-09 23:23 ` Sam Vilain
2008-01-09 23:31 ` Johannes Schindelin
2008-01-10 1:02 ` Sam Vilain
2008-01-10 5:02 ` Sam Vilain
2008-01-10 9:16 ` Pierre Habouzit
2008-01-10 20:39 ` Nicolas Pitre
2008-01-10 21:01 ` Linus Torvalds
2008-01-10 21:30 ` Nicolas Pitre
2008-01-11 8:57 ` Pierre Habouzit
2008-01-10 21:45 ` Sam Vilain
2008-01-10 22:03 ` Linus Torvalds
2008-01-10 22:28 ` Sam Vilain
2008-01-10 22:56 ` Linus Torvalds
2008-01-11 1:01 ` Sam Vilain
2008-01-11 2:10 ` Linus Torvalds
2008-01-11 6:29 ` Sam Vilain
2008-01-11 7:05 ` Sam Vilain
2008-01-11 16:03 ` Linus Torvalds
2008-01-12 1:52 ` Sam Vilain [this message]
2008-01-12 2:32 ` Nicolas Pitre
2008-01-12 3:06 ` Sam Vilain
2008-01-12 16:09 ` Nicolas Pitre
2008-01-12 16:44 ` Johannes Schindelin
2008-01-12 4:46 ` Junio C Hamano
2008-01-10 21:51 ` Marco Costalba
2008-01-10 22:01 ` Sam Vilain
2008-01-10 22:18 ` Nicolas Pitre
2008-01-11 9:45 ` Pierre Habouzit
2008-01-11 14:27 ` Nicolas Pitre
2008-01-11 14:18 ` Morten Welinder
2008-01-10 3:41 ` Nicolas Pitre
2008-01-10 6:55 ` Marco Costalba
2008-01-10 11:45 ` Marco Costalba
2008-01-10 12:12 ` Johannes Schindelin
2008-01-10 12:18 ` Marco Costalba
2008-01-10 19:34 ` Dana How
2008-01-09 23:49 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47881D44.9060105@vilain.net \
--to=sam@vilain.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=madcoder@debian.org \
--cc=mcostalba@gmail.com \
--cc=nico@cam.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.