From: Sam Vilain <sam@vilain.net>
To: Steven Grimm <koreth@midwinter.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Junio C Hamano <junkio@cox.net>,
Daniel Barkalow <barkalow@iabervon.org>,
Theodore Ts'o <tytso@mit.edu>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Add --no-reuse-delta option to git-gc
Date: Sun, 10 Jun 2007 19:40:00 +1200 [thread overview]
Message-ID: <466BAAD0.9060408@vilain.net> (raw)
In-Reply-To: <20070509191052.GD3141@spearce.org>
Shawn O. Pearce wrote:
>> On that note, has any thought been given to looking at other compression
>> algorithms? Gzip is a great high-speed compressor, but there are others
>> out there (some a bit slower, some much slower at both compression and
>> decompression) that produce substantially smaller output.
>>
> Its been discussed once before on the list, in very recent history,
> but not by a whole lot. As Junio pointed out, I don't think there
> ever really was any discussion of is gzip the best way to deflate the
> objects. I think gzip was just chosen simply because it was readily
> available in libz, stable, and has a pretty decent speed/size ratio.
>
I think it's the right tool. I just don't see any point in changing to
anything slower for the sake of 20% space saving. Especially bzip2.
Consider this.
Compression works primarily through two things: huffman coding and
string matching. The larger the window for your string matching, the
slower the compression and the more memory you need thrashing your CPU
memory cache when decompressing.
Now I'm not an expert on compression algorithms but I think a large part
of the reason gzip is blindingly faster than bzip2 is because gzip uses
a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches
large enough to deal with that size of buffer, the rest of the time
you're waiting for your RAM. Moore's law was supposed to make bzip2 fast
one of these days but I'm still waiting.
But with git-repack the window is effectively the size of your
repository. So that blows bzip2 out of the water. Why else can git make
compressed packs smaller than a .bz2 of the raw files? This is the same
observation Shawn makes with the pack-wide dictionary, but he sounds
like he wants to apply it to the huffman coding stage as well as the
current delta/string matching stage. Now that would be interesting...
Anyway it's a free world so be my guest to implement it, I guess if this
was selectable it would only be a minor annoyance waiting a bit longer
pulling from from some repositories, and it would be interesting to see
if it did make a big difference with pack file sizes.
Sam
next prev parent reply other threads:[~2007-06-10 7:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o
2007-05-08 3:13 ` Nicolas Pitre
2007-05-08 3:21 ` Theodore Tso
2007-05-08 3:38 ` Dana How
2007-05-08 4:43 ` Junio C Hamano
2007-05-08 13:46 ` Nicolas Pitre
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
2007-05-08 15:35 ` Nicolas Pitre
2007-05-09 5:05 ` Daniel Barkalow
2007-05-09 8:15 ` Junio C Hamano
2007-05-09 9:02 ` Steven Grimm
2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin
2007-05-09 15:15 ` Junio C Hamano
2007-05-09 19:10 ` Shawn O. Pearce
2007-06-10 7:40 ` Sam Vilain [this message]
2007-06-11 1:51 ` Nicolas Pitre
2007-06-11 6:20 ` Steven Grimm
2007-06-11 6:31 ` Shawn O. Pearce
2007-06-11 10:20 ` Johannes Schindelin
2007-06-11 14:01 ` Nicolas Pitre
2007-06-11 21:40 ` Johannes Schindelin
2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
2007-05-09 20:19 ` Junio C Hamano
2007-05-09 22:22 ` Theodore Tso
2007-05-10 7:38 ` Junio C Hamano
2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre
2007-05-08 16:30 ` Theodore Tso
2007-05-08 16:49 ` Johannes Schindelin
2007-05-08 18:09 ` Theodore Tso
2007-05-08 18:46 ` Nicolas Pitre
2007-05-09 13:49 ` Theodore Tso
2007-05-09 14:17 ` Johannes Schindelin
2007-05-08 17:07 ` Dana How
2007-05-08 17:35 ` Nicolas Pitre
2007-05-09 5:03 ` Junio C Hamano
2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre
2007-05-08 21:12 ` Junio C Hamano
2007-05-08 23:59 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=466BAAD0.9060408@vilain.net \
--to=sam@vilain.net \
--cc=barkalow@iabervon.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=koreth@midwinter.com \
--cc=spearce@spearce.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).