* Compression speed for large files @ 2006-07-03 11:13 Joachim B Haga 2006-07-03 12:03 ` Alex Riesen 2006-07-03 21:45 ` Compression speed for large files Jeff King 0 siblings, 2 replies; 21+ messages in thread From: Joachim B Haga @ 2006-07-03 11:13 UTC (permalink / raw) To: git I'm looking at doing version control of data files, potentially very large, often binary. In git, committing of large files is very slow; I have tested with a 45MB file, which takes about 1 minute to check in (on an intel core-duo 2GHz). Now, most of the time is spent in compressing the file. Would it be a good idea to change the Z_BEST_COMPRESSION flag to zlib, at least for large files? I have measured the time spent by git-commit with different flags in sha1_file.c: method time (s) object size (kB) Z_BEST_COMPRESSION 62.0 17136 Z_DEFAULT_COMPRESSION 10.4 16536 Z_BEST_SPEED 4.8 17071 In this case Z_BEST_COMPRESSION also compresses worse, but that's not the major issue: the time is. Here's a couple of other data points, measured with gzip -9, -6 and -1 (comparable to the Z_ flags above): 129MB ascii data file method time (s) object size (kB) gzip -9 158 23066 gzip -6 18 23619 gzip -1 6 32304 3MB ascii data file gzip -9 2.2 887 gzip -6 0.7 912 gzip -1 0.3 1134 So: is it a good idea to change to faster compression, at least for larger files? From my (limited) testing I would suggest using Z_BEST_COMPRESSION only for small files (perhaps <1MB?) and Z_DEFAULT_COMPRESSION/Z_BEST_SPEED for larger ones. -j. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 11:13 Compression speed for large files Joachim B Haga @ 2006-07-03 12:03 ` Alex Riesen 2006-07-03 12:42 ` Elrond 2006-07-03 13:32 ` Joachim Berdal Haga 2006-07-03 21:45 ` Compression speed for large files Jeff King 1 sibling, 2 replies; 21+ messages in thread From: Alex Riesen @ 2006-07-03 12:03 UTC (permalink / raw) To: Joachim B Haga; +Cc: git On 7/3/06, Joachim B Haga <cjhaga@fys.uio.no> wrote: > So: is it a good idea to change to faster compression, at least for larger > files? From my (limited) testing I would suggest using Z_BEST_COMPRESSION only > for small files (perhaps <1MB?) and Z_DEFAULT_COMPRESSION/Z_BEST_SPEED for > larger ones. Probably yes, as a per-repo config option. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 12:03 ` Alex Riesen @ 2006-07-03 12:42 ` Elrond 2006-07-03 13:44 ` Joachim B Haga 2006-07-03 13:32 ` Joachim Berdal Haga 1 sibling, 1 reply; 21+ messages in thread From: Elrond @ 2006-07-03 12:42 UTC (permalink / raw) To: git Joachim B Haga <cjhaga <at> fys.uio.no> writes: [...] > method time (s) object size (kB) > Z_BEST_COMPRESSION 62.0 17136 > Z_DEFAULT_COMPRESSION 10.4 16536 > Z_BEST_SPEED 4.8 17071 > > In this case Z_BEST_COMPRESSION also compresses worse, [...] I personally find that very interesting, is this a known "issue" with zlib? It suggests, that with different options, it's possible to create smaller repositories, despite the 'advertised' (by zlib, not git) "best" compression. Alex Riesen <raa.lkml <at> gmail.com> writes: [...] > Probably yes, as a per-repo config option. The option probably should be the size for which to start using "default" compression. Elrond ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 12:42 ` Elrond @ 2006-07-03 13:44 ` Joachim B Haga 0 siblings, 0 replies; 21+ messages in thread From: Joachim B Haga @ 2006-07-03 13:44 UTC (permalink / raw) To: git Elrond <elrond+kernel.org <at> samba-tng.org> writes: > > Joachim B Haga <cjhaga <at> fys.uio.no> writes: > [...] > > In this case Z_BEST_COMPRESSION also compresses worse, > [...] > > I personally find that very interesting, is this a known "issue" with zlib? > It suggests, that with different options, it's possible to create smaller > repositories, despite the 'advertised' (by zlib, not git) "best" compression. There are also other tunables in zlib, such as the balance between Huffman coding (good for data files) and string matching (good for text files). So with more knowledge of the data it should be possible to compress even better. I'm not advocating tuning this in git though ;) > > Alex Riesen <raa.lkml <at> gmail.com> writes: > [...] > > Probably yes, as a per-repo config option. > > The option probably should be the size for which to start using > "default" compression. That is possible, too. I'm open to any decision or consensus, as long as I get my commits in less than 10s :) -j. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 12:03 ` Alex Riesen 2006-07-03 12:42 ` Elrond @ 2006-07-03 13:32 ` Joachim Berdal Haga [not found] ` <Pine.LN X.4.64.0607031030150.1213@localhost.localdomain> 2006-07-03 14:33 ` Nicolas Pitre 1 sibling, 2 replies; 21+ messages in thread From: Joachim Berdal Haga @ 2006-07-03 13:32 UTC (permalink / raw) To: Alex Riesen; +Cc: git Alex Riesen wrote: > On 7/3/06, Joachim B Haga <cjhaga@fys.uio.no> wrote: >> So: is it a good idea to change to faster compression, at least for >> larger files? From my (limited) testing I would suggest using >> Z_BEST_COMPRESSION only for small files (perhaps <1MB?) and >> Z_DEFAULT_COMPRESSION/Z_BEST_SPEED for >> larger ones. > > Probably yes, as a per-repo config option. I can send a patch later. If it's to be a per-repo option, it's probably too confusing with several values. Is it ok with core.compression = [-1..9] where the numbers are the zlib/gzip constants, -1 = zlib default (currently 6) 0 = no compression 1..9 = various speed/size tradeoffs (9 is git default) Btw; I just tested the kernel sources. With gzip only, but files compressed individually: time find . -type f | xargs gzip -9 -c | wc -c I found the space saving from -6 to -9 to be under 0.6%, at double the CPU time. So perhaps Z_DEFAULT_COMPRESSION would be good as default. -j ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <Pine.LN X.4.64.0607031030150.1213@localhost.localdomain>]
* Re: Compression speed for large files 2006-07-03 13:32 ` Joachim Berdal Haga [not found] ` <Pine.LN X.4.64.0607031030150.1213@localhost.localdomain> @ 2006-07-03 14:33 ` Nicolas Pitre 2006-07-03 14:54 ` Yakov Lerner 2006-07-03 16:31 ` Linus Torvalds 1 sibling, 2 replies; 21+ messages in thread From: Nicolas Pitre @ 2006-07-03 14:33 UTC (permalink / raw) To: Joachim Berdal Haga; +Cc: Alex Riesen, git On Mon, 3 Jul 2006, Joachim Berdal Haga wrote: > Alex Riesen wrote: > > On 7/3/06, Joachim B Haga <cjhaga@fys.uio.no> wrote: > > > So: is it a good idea to change to faster compression, at least for larger > > > files? From my (limited) testing I would suggest using Z_BEST_COMPRESSION > > > only for small files (perhaps <1MB?) and > > > Z_DEFAULT_COMPRESSION/Z_BEST_SPEED for > > > larger ones. > > > > Probably yes, as a per-repo config option. > > I can send a patch later. If it's to be a per-repo option, it's probably too > confusing with several values. Is it ok with > > core.compression = [-1..9] > > where the numbers are the zlib/gzip constants, > -1 = zlib default (currently 6) > 0 = no compression > 1..9 = various speed/size tradeoffs (9 is git default) I think this makes a lot of sense, although IMHO I'd simply use Z_DEFAULT_COMPRESSION everywhere and be done with it without extra complexity which aren't worth the size difference. Nicolas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 14:33 ` Nicolas Pitre @ 2006-07-03 14:54 ` Yakov Lerner 2006-07-03 15:17 ` Johannes Schindelin 2006-07-03 16:31 ` Linus Torvalds 1 sibling, 1 reply; 21+ messages in thread From: Yakov Lerner @ 2006-07-03 14:54 UTC (permalink / raw) Cc: git On 7/3/06, Nicolas Pitre <nico@cam.org> wrote: > On Mon, 3 Jul 2006, Joachim Berdal Haga wrote: > > > Alex Riesen wrote: > > > On 7/3/06, Joachim B Haga <cjhaga@fys.uio.no> wrote: > > > > So: is it a good idea to change to faster compression, at least for larger > > > > files? From my (limited) testing I would suggest using Z_BEST_COMPRESSION > > > > only for small files (perhaps <1MB?) and > > > > Z_DEFAULT_COMPRESSION/Z_BEST_SPEED for > > > > larger ones. > > > > > > Probably yes, as a per-repo config option. > > > > I can send a patch later. If it's to be a per-repo option, it's probably too > > confusing with several values. Is it ok with > > > > core.compression = [-1..9] > > > > where the numbers are the zlib/gzip constants, > > -1 = zlib default (currently 6) > > 0 = no compression > > 1..9 = various speed/size tradeoffs (9 is git default) It would be arguable whether, say, 10% better compression is worth x(3-8) slower compression. But 3-4% better compression at the cost of x(3-8) slower compression time as data suggest ? I think this begs for switching the default to Z_DEFAULT_COMPRESSION Yakov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 14:54 ` Yakov Lerner @ 2006-07-03 15:17 ` Johannes Schindelin 0 siblings, 0 replies; 21+ messages in thread From: Johannes Schindelin @ 2006-07-03 15:17 UTC (permalink / raw) To: Yakov Lerner; +Cc: git Hi, On Mon, 3 Jul 2006, Yakov Lerner wrote: > It would be arguable whether, say, 10% better compression is worth > x(3-8) slower compression. But 3-4% better compression at the cost of > x(3-8) slower compression time as data suggest ? I think this begs for > switching the default to Z_DEFAULT_COMPRESSION The real problem, of course, is that you cannot know before you tried, if your data is really well compressible or not. Ciao, Dscho ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 14:33 ` Nicolas Pitre 2006-07-03 14:54 ` Yakov Lerner @ 2006-07-03 16:31 ` Linus Torvalds 2006-07-03 18:59 ` [PATCH] Make zlib compression level configurable, and change default Joachim B Haga 2006-07-03 19:02 ` [PATCH] Use configurable zlib compression level everywhere Joachim B Haga 1 sibling, 2 replies; 21+ messages in thread From: Linus Torvalds @ 2006-07-03 16:31 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Joachim Berdal Haga, Alex Riesen, git On Mon, 3 Jul 2006, Nicolas Pitre wrote: > On Mon, 3 Jul 2006, Joachim Berdal Haga wrote: > > > > I can send a patch later. If it's to be a per-repo option, it's probably too > > confusing with several values. Is it ok with > > > > core.compression = [-1..9] > > > > where the numbers are the zlib/gzip constants, > > -1 = zlib default (currently 6) > > 0 = no compression > > 1..9 = various speed/size tradeoffs (9 is git default) > > I think this makes a lot of sense, although IMHO I'd simply use > Z_DEFAULT_COMPRESSION everywhere and be done with it without extra > complexity which aren't worth the size difference. I think Z_DEFAULT_COMPRESSION is fine too - we've long since started relying on pack-files and the delta compression for the _real_ size improvements, and as such, the zlib compression is less important. That said, the "core.compression" thing sounds good to me, and gives people the ability to tune things for their loads. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] Make zlib compression level configurable, and change default. 2006-07-03 16:31 ` Linus Torvalds @ 2006-07-03 18:59 ` Joachim B Haga 2006-07-03 19:33 ` Linus Torvalds 2006-07-03 19:02 ` [PATCH] Use configurable zlib compression level everywhere Joachim B Haga 1 sibling, 1 reply; 21+ messages in thread From: Joachim B Haga @ 2006-07-03 18:59 UTC (permalink / raw) To: git; +Cc: Nicolas Pitre, Linus Torvalds, Alex Riesen, Junio C Hamano Make zlib compression level configurable, and change the default. With the change in default, "git add ." on kernel dir is about twice as fast as before, with only minimal (0.5%) change in object size. The speed difference is even more noticeable when committing large files, which is now up to 8 times faster. The configurability is through setting core.compression = [-1..9] which maps to the zlib constants; -1 is the default, 0 is no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no) --- Documentation/config.txt | 6 ++++++ cache.h | 1 + config.c | 5 +++++ environment.c | 1 + sha1_file.c | 4 ++-- 5 files changed, 15 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index a04c5ad..ac89be7 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -91,6 +91,12 @@ core.warnAmbiguousRefs:: If true, git will warn you if the ref name you passed it is ambiguous and might match multiple refs in the .git/refs/ tree. True by default. +core.compression: + An integer -1..9, indicating the compression level for objects that + are not in a pack file. -1 is the zlib and git default. 0 means no + compression, and 1..9 are various speed/size tradeoffs, 9 being + slowest. + alias.*:: Command aliases for the gitlink:git[1] command wrapper - e.g. after defining "alias.last = cat-file commit HEAD", the invocation diff --git a/cache.h b/cache.h index 8719939..84770bf 100644 --- a/cache.h +++ b/cache.h @@ -183,6 +183,7 @@ extern int log_all_ref_updates; extern int warn_ambiguous_refs; extern int shared_repository; extern const char *apply_default_whitespace; +extern int zlib_compression_level; #define GIT_REPO_VERSION 0 extern int repository_format_version; diff --git a/config.c b/config.c index ec44827..61563be 100644 --- a/config.c +++ b/config.c @@ -279,6 +279,11 @@ int git_default_config(const char *var, return 0; } + if (!strcmp(var, "core.compression")) { + zlib_compression_level = git_config_int(var, value); + return 0; + } + if (!strcmp(var, "user.name")) { strlcpy(git_default_name, value, sizeof(git_default_name)); return 0; diff --git a/environment.c b/environment.c index 3de8eb3..1d8ceef 100644 --- a/environment.c +++ b/environment.c @@ -20,6 +20,7 @@ int repository_format_version = 0; char git_commit_encoding[MAX_ENCODING_LENGTH] = "utf-8"; int shared_repository = PERM_UMASK; const char *apply_default_whitespace = NULL; +int zlib_compression_level = -1; static char *git_dir, *git_object_dir, *git_index_file, *git_refs_dir, *git_graft_file; diff --git a/sha1_file.c b/sha1_file.c index 8179630..bc35808 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1458,7 +1458,7 @@ int write_sha1_file(void *buf, unsigned /* Set it up */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_BEST_COMPRESSION); + deflateInit(&stream, zlib_compression_level); size = deflateBound(&stream, len+hdrlen); compressed = xmalloc(size); @@ -1511,7 +1511,7 @@ static void *repack_object(const unsigne /* Set it up */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_BEST_COMPRESSION); + deflateInit(&stream, zlib_compression_level); size = deflateBound(&stream, len + hdrlen); buf = xmalloc(size); -- 1.4.1.g8fced-dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Make zlib compression level configurable, and change default. 2006-07-03 18:59 ` [PATCH] Make zlib compression level configurable, and change default Joachim B Haga @ 2006-07-03 19:33 ` Linus Torvalds 2006-07-03 19:50 ` Linus Torvalds 2006-07-03 20:11 ` Joachim B Haga 0 siblings, 2 replies; 21+ messages in thread From: Linus Torvalds @ 2006-07-03 19:33 UTC (permalink / raw) To: Joachim B Haga Cc: Nicolas Pitre, Alex Riesen, Junio C Hamano, Git Mailing List On Mon, 3 Jul 2006, Joachim B Haga wrote: > > The configurability is through setting core.compression = [-1..9] > which maps to the zlib constants; -1 is the default, 0 is no > compression, and 1..9 are various speed/size tradeoffs, 9 > being slowest. My only worry is that this encodes "Z_DEFAULT_COMPRESSION" as being -1, which happens to be /true/, but I don't think that's a documented interface (you're supposed to use the Z_DEFAULT_COMPRESSION macro, which could have any value, and just _happens_ to be -1). Is it likely to ever change from that -1? Probably not. So I think your patch is technically correct, but it might just be nicer if it did something like .. if (!strcmp(var, "core.compression")) { int level = git_config_int(var, value); if (level == -1) level = Z_DEFAULT_COMPRESSION; else if (level < 0 || level > Z_BEST_COMPRESSION) die("bad zlib compression level %d", level); zlib_compression_level = level; return 0; } .. which would be safer, and a smart compiler might notice that the -1 case ends up being a no-op, and then just generate code AS IF we just had a if (level < -1 || level > Z_BEST_COMPRESSION) die(... there. Oh, and for all the same reasons, we should use int zlib_compression_level = Z_BEST_COMPRESSION; for the default initializer. Hmm? Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Make zlib compression level configurable, and change default. 2006-07-03 19:33 ` Linus Torvalds @ 2006-07-03 19:50 ` Linus Torvalds 2006-07-03 20:11 ` Joachim B Haga 1 sibling, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2006-07-03 19:50 UTC (permalink / raw) To: Joachim B Haga Cc: Nicolas Pitre, Alex Riesen, Junio C Hamano, Git Mailing List On Mon, 3 Jul 2006, Linus Torvalds wrote: > > Oh, and for all the same reasons, we should use > > int zlib_compression_level = Z_BEST_COMPRESSION; That should be Z_DEFAULT_COMPRESSION, of course. Anyway, I think the patches are ok as-is, and my suggestion to avoid the "-1" and use Z_DEFAULT_COMPRESSION is really just an additional comment, not anything fundamental. So Junio, feel free to add an Acked-by: Linus Torvalds <torvalds@osdl.org> regardless of whether also doing that. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Make zlib compression level configurable, and change default. 2006-07-03 19:33 ` Linus Torvalds 2006-07-03 19:50 ` Linus Torvalds @ 2006-07-03 20:11 ` Joachim B Haga 1 sibling, 0 replies; 21+ messages in thread From: Joachim B Haga @ 2006-07-03 20:11 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Junio C Hamano Linus Torvalds <torvalds@osdl.org> writes: > [snip suggested improvements] Yes, that would be more... thorough. And, especially for the int zlib_compression_level = Z_DEFAULT_COMPRESSION; line, more self-explanatory, too. So here's an updated patch (replacing the previous) including your suggestions. -j. - Make zlib compression level configurable, and change default. With the change in default, "git add ." on kernel dir is about twice as fast as before, with only minimal (0.5%) change in object size. The speed difference is even more noticeable when committing large files, which is now up to 8 times faster. The configurability is through setting core.compression = [-1..9] which maps to the zlib constants; -1 is the default, 0 is no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no) --- Documentation/config.txt | 6 ++++++ cache.h | 1 + config.c | 10 ++++++++++ environment.c | 1 + sha1_file.c | 4 ++-- 5 files changed, 20 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index a04c5ad..ac89be7 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -91,6 +91,12 @@ core.warnAmbiguousRefs:: If true, git will warn you if the ref name you passed it is ambiguous and might match multiple refs in the .git/refs/ tree. True by default. +core.compression: + An integer -1..9, indicating the compression level for objects that + are not in a pack file. -1 is the zlib and git default. 0 means no + compression, and 1..9 are various speed/size tradeoffs, 9 being + slowest. + alias.*:: Command aliases for the gitlink:git[1] command wrapper - e.g. after defining "alias.last = cat-file commit HEAD", the invocation diff --git a/cache.h b/cache.h index 8719939..84770bf 100644 --- a/cache.h +++ b/cache.h @@ -183,6 +183,7 @@ extern int log_all_ref_updates; extern int warn_ambiguous_refs; extern int shared_repository; extern const char *apply_default_whitespace; +extern int zlib_compression_level; #define GIT_REPO_VERSION 0 extern int repository_format_version; diff --git a/config.c b/config.c index ec44827..b23f4bf 100644 --- a/config.c +++ b/config.c @@ -279,6 +279,16 @@ int git_default_config(const char *var, return 0; } + if (!strcmp(var, "core.compression")) { + int level = git_config_int(var, value); + if (level == -1) + level = Z_DEFAULT_COMPRESSION; + else if (level < 0 || level > Z_BEST_COMPRESSION) + die("bad zlib compression level %d", level); + zlib_compression_level = level; + return 0; + } + if (!strcmp(var, "user.name")) { strlcpy(git_default_name, value, sizeof(git_default_name)); return 0; diff --git a/environment.c b/environment.c index 3de8eb3..43823ff 100644 --- a/environment.c +++ b/environment.c @@ -20,6 +20,7 @@ int repository_format_version = 0; char git_commit_encoding[MAX_ENCODING_LENGTH] = "utf-8"; int shared_repository = PERM_UMASK; const char *apply_default_whitespace = NULL; +int zlib_compression_level = Z_DEFAULT_COMPRESSION; static char *git_dir, *git_object_dir, *git_index_file, *git_refs_dir, *git_graft_file; diff --git a/sha1_file.c b/sha1_file.c index 8179630..bc35808 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1458,7 +1458,7 @@ int write_sha1_file(void *buf, unsigned /* Set it up */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_BEST_COMPRESSION); + deflateInit(&stream, zlib_compression_level); size = deflateBound(&stream, len+hdrlen); compressed = xmalloc(size); @@ -1511,7 +1511,7 @@ static void *repack_object(const unsigne /* Set it up */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_BEST_COMPRESSION); + deflateInit(&stream, zlib_compression_level); size = deflateBound(&stream, len + hdrlen); buf = xmalloc(size); -- 1.4.1.g8fced-dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH] Use configurable zlib compression level everywhere. 2006-07-03 16:31 ` Linus Torvalds 2006-07-03 18:59 ` [PATCH] Make zlib compression level configurable, and change default Joachim B Haga @ 2006-07-03 19:02 ` Joachim B Haga 2006-07-03 19:43 ` Junio C Hamano 1 sibling, 1 reply; 21+ messages in thread From: Joachim B Haga @ 2006-07-03 19:02 UTC (permalink / raw) To: git; +Cc: Nicolas Pitre, Linus Torvalds, Alex Riesen, Junio C Hamano This one I'm not so sure about, it's for completeness. But I don't actually use git and haven't tested beyond the git add / git commit stage. Still... Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no) --- csum-file.c | 2 +- diff.c | 2 +- http-push.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/csum-file.c b/csum-file.c index ebaad03..6a7b40f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -122,7 +122,7 @@ int sha1write_compressed(struct sha1file void *out; memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_DEFAULT_COMPRESSION); + deflateInit(&stream, zlib_compression_level); maxsize = deflateBound(&stream, size); out = xmalloc(maxsize); diff --git a/diff.c b/diff.c index 5a71489..428ff78 100644 --- a/diff.c +++ b/diff.c @@ -583,7 +583,7 @@ static unsigned char *deflate_it(char *d z_stream stream; memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_BEST_COMPRESSION); + deflateInit(&stream, zlib_compression_level); bound = deflateBound(&stream, size); deflated = xmalloc(bound); stream.next_out = deflated; diff --git a/http-push.c b/http-push.c index e281f70..f761584 100644 --- a/http-push.c +++ b/http-push.c @@ -492,7 +492,7 @@ static void start_put(struct transfer_re /* Set it up */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, Z_BEST_COMPRESSION); + deflateInit(&stream, zlib_compression_level); size = deflateBound(&stream, len + hdrlen); request->buffer.buffer = xmalloc(size); -- 1.4.1.g8fced-dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Use configurable zlib compression level everywhere. 2006-07-03 19:02 ` [PATCH] Use configurable zlib compression level everywhere Joachim B Haga @ 2006-07-03 19:43 ` Junio C Hamano 2006-07-07 21:53 ` David Lang 0 siblings, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2006-07-03 19:43 UTC (permalink / raw) To: Joachim B Haga; +Cc: git Joachim B Haga <cjhaga@fys.uio.no> writes: > This one I'm not so sure about, it's for completeness. But I don't actually use > git and haven't tested beyond the git add / git commit stage. Still... > > Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no) You made a good judgement to notice that these three are different. * sha1write_compressed() in csum-file.c is for producing packs and most of the things we compress there are deltas and less compressible, so even when core.compression is set to high we might be better off using faster compression. * diff's deflate_it() is about producing binary diffs (later encoded in base85) for textual transfer. Again it is almost always used to compress deltas so the same comment as above apply to this. * http-push uses it to send compressed whole object, and this is only used over the network, so it is plausible that the user would want to use different compression level than the usual core.compression. It is fine by me to use the same core.compression to these three. If somebody comes up with a workload that benefits from having different settings for them, we can add separate variables, falling back on the default core.compression if there isn't one, as needed. Thanks for the patches. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use configurable zlib compression level everywhere. 2006-07-03 19:43 ` Junio C Hamano @ 2006-07-07 21:53 ` David Lang 2006-07-08 2:10 ` Johannes Schindelin 0 siblings, 1 reply; 21+ messages in thread From: David Lang @ 2006-07-07 21:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: Joachim B Haga, git On Mon, 3 Jul 2006, Junio C Hamano wrote: > * sha1write_compressed() in csum-file.c is for producing packs > and most of the things we compress there are deltas and less > compressible, so even when core.compression is set to high we > might be better off using faster compression. why would deltas have poor compression? I'd expect them to have about the same as the files they are deltas of (or slightly better due to the fact that the deta metainfo is highly repetitive) David Lang ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use configurable zlib compression level everywhere. 2006-07-07 21:53 ` David Lang @ 2006-07-08 2:10 ` Johannes Schindelin 0 siblings, 0 replies; 21+ messages in thread From: Johannes Schindelin @ 2006-07-08 2:10 UTC (permalink / raw) To: David Lang; +Cc: Junio C Hamano, Joachim B Haga, git Hi, On Fri, 7 Jul 2006, David Lang wrote: > On Mon, 3 Jul 2006, Junio C Hamano wrote: > > > * sha1write_compressed() in csum-file.c is for producing packs > > and most of the things we compress there are deltas and less > > compressible, so even when core.compression is set to high we > > might be better off using faster compression. > > why would deltas have poor compression? I'd expect them to have about the same > as the files they are deltas of (or slightly better due to the fact that the > deta metainfo is highly repetitive) Deltas should have poor compression by definition, because compression tries to encode those parts of the file more efficiently, which do not bear much information (think entropy). If you have deltas which really make sense, they are almost _pure_ information, i.e. they do not contain much redundancy, as compared to real files. So, the compression (which does not know anything about the characteristics of deltas in particular) cannot take much redundancy out of the delta. Therefore, the entropy is very high, and the compression rate is low. Hope this makes sense to you, Dscho ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 11:13 Compression speed for large files Joachim B Haga 2006-07-03 12:03 ` Alex Riesen @ 2006-07-03 21:45 ` Jeff King 2006-07-03 22:25 ` Joachim Berdal Haga 1 sibling, 1 reply; 21+ messages in thread From: Jeff King @ 2006-07-03 21:45 UTC (permalink / raw) To: Joachim B Haga; +Cc: git On Mon, Jul 03, 2006 at 11:13:34AM +0000, Joachim B Haga wrote: > often binary. In git, committing of large files is very slow; I have > tested with a 45MB file, which takes about 1 minute to check in (on an > intel core-duo 2GHz). I know this has already been somewhat solved, but I found your numbers curiously high. I work quite a bit with git and large files and I haven't noticed this slowdown. Can you be more specific about your load? Are you sure it is zlib? On my 1.8Ghz Athlon, compressing 45MB of zeros into 20K takes about 2s. Compressing 45MB of random data into a 45MB object takes 6.3s. In either case, the commit takes only about 0.5s (since cogito stores the object during the cg-add). Is there some specific file pattern which is slow to compress? -Peff ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 21:45 ` Compression speed for large files Jeff King @ 2006-07-03 22:25 ` Joachim Berdal Haga 2006-07-03 23:02 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Joachim Berdal Haga @ 2006-07-03 22:25 UTC (permalink / raw) To: Jeff King; +Cc: Joachim B Haga, git Jeff King wrote: > On Mon, Jul 03, 2006 at 11:13:34AM +0000, Joachim B Haga wrote: > >> often binary. In git, committing of large files is very slow; I have >> tested with a 45MB file, which takes about 1 minute to check in (on an >> intel core-duo 2GHz). > > I know this has already been somewhat solved, but I found your numbers > curiously high. I work quite a bit with git and large files and I > haven't noticed this slowdown. Can you be more specific about your load? > Are you sure it is zlib? Quite sure: at least to the extent that it is fixed by lowering the compression level. But the wording was inexact: it's during object creation, which happens at initial "git add" and then later during "git commit". But... > y 1.8Ghz Athlon, compressing 45MB of zeros into 20K takes about 2s. > Compressing 45MB of random data into a 45MB object takes 6.3s. In either > case, the commit takes only about 0.5s (since cogito stores the object > during the cg-add). > > Is there some specific file pattern which is slow to compress? yes, it seems so. At least the effect is much more pronounced for my files than for random/null data. "My" files are in this context generated data files, binary or ascii. Here's a test with "time gzip -[169] -c file >/dev/null". Random data from /dev/urandom, kernel headers are concatenation of *.h in kernel sources. All times in seconds, on my puny home computer (1GHz Via Nehemiah) random (23MB) data (23MB) headers (44MB) -9 10.2 72.5 38.5 -6 10.2 13.5 12.9 -1 9.9 4.1 7.0 So... data dependent, yes. But it hits even for normal source code. (Btw; the default (-6) seems to be less data dependent than the other values. Maybe that's on purpose.) If you want to look at a highly-variable dataset (the one above), try http://lupus.ig3.net/SIMULATION.dx.gz (5MB, slow server), but that's just an example, I see the same variability for example also on binary data files. -j. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 22:25 ` Joachim Berdal Haga @ 2006-07-03 23:02 ` Linus Torvalds 2006-07-04 5:42 ` Joachim Berdal Haga 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2006-07-03 23:02 UTC (permalink / raw) To: Joachim Berdal Haga; +Cc: Jeff King, Joachim B Haga, git On Tue, 4 Jul 2006, Joachim Berdal Haga wrote: > > Here's a test with "time gzip -[169] -c file >/dev/null". Random data > from /dev/urandom, kernel headers are concatenation of *.h in kernel > sources. All times in seconds, on my puny home computer (1GHz Via Nehemiah) That "Via Nehemiah" is probably a big part of it. I think the VIA Nehemiah just has a 64kB L2 cache, and I bet performance plummets if the tables end up being used past that. And I think a large part of the higher compressions is that they allow the compression window and tables to grow bigger. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Compression speed for large files 2006-07-03 23:02 ` Linus Torvalds @ 2006-07-04 5:42 ` Joachim Berdal Haga 0 siblings, 0 replies; 21+ messages in thread From: Joachim Berdal Haga @ 2006-07-04 5:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, git Linus Torvalds wrote: > > On Tue, 4 Jul 2006, Joachim Berdal Haga wrote: >> Here's a test with "time gzip -[169] -c file >/dev/null". Random data >> from /dev/urandom, kernel headers are concatenation of *.h in kernel >> sources. All times in seconds, on my puny home computer (1GHz Via Nehemiah) > > That "Via Nehemiah" is probably a big part of it. > > I think the VIA Nehemiah just has a 64kB L2 cache, and I bet performance > plummets if the tables end up being used past that. Not really. The numbers in my original post were from a Intel core-duo, they were: 158/18/6 s for comparable (but larger) data. And on a P4 1.8GHz with 512kB L2, the same 23MB data file compresses in 28.1/5.9/1.3 s. That's a factor 22 slowest/fastest; the VIA was only factor 18, so the difference is actually *larger*. -j. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-07-08 2:10 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-03 11:13 Compression speed for large files Joachim B Haga
2006-07-03 12:03 ` Alex Riesen
2006-07-03 12:42 ` Elrond
2006-07-03 13:44 ` Joachim B Haga
2006-07-03 13:32 ` Joachim Berdal Haga
[not found] ` <Pine.LN X.4.64.0607031030150.1213@localhost.localdomain>
2006-07-03 14:33 ` Nicolas Pitre
2006-07-03 14:54 ` Yakov Lerner
2006-07-03 15:17 ` Johannes Schindelin
2006-07-03 16:31 ` Linus Torvalds
2006-07-03 18:59 ` [PATCH] Make zlib compression level configurable, and change default Joachim B Haga
2006-07-03 19:33 ` Linus Torvalds
2006-07-03 19:50 ` Linus Torvalds
2006-07-03 20:11 ` Joachim B Haga
2006-07-03 19:02 ` [PATCH] Use configurable zlib compression level everywhere Joachim B Haga
2006-07-03 19:43 ` Junio C Hamano
2006-07-07 21:53 ` David Lang
2006-07-08 2:10 ` Johannes Schindelin
2006-07-03 21:45 ` Compression speed for large files Jeff King
2006-07-03 22:25 ` Joachim Berdal Haga
2006-07-03 23:02 ` Linus Torvalds
2006-07-04 5:42 ` Joachim Berdal Haga
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).