git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Sam Vilain <sam@vilain.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Nicolas Pitre <nico@cam.org>,
	Pierre Habouzit <madcoder@debian.org>,
	Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Marco Costalba <mcostalba@gmail.com>
Subject: Re: Decompression speed: zip vs lzo
Date: Fri, 11 Jan 2008 20:46:01 -0800	[thread overview]
Message-ID: <7vzlvby7li.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <47881D44.9060105@vilain.net> (Sam Vilain's message of "Sat, 12 Jan 2008 14:52:04 +1300")

Sam Vilain <sam@vilain.net> writes:

> If the uncompressed objects are clustered in the pack, then they might
> stream compress a lot better, should they be tranmitted over a http
> transport with gzip encoding.

That would only have been a sensible optimization in older
native pack protocol, where we always exploded the transferred
packfile.  However, these days, we tend to keep the packfile and
re-index at the receiving end (http transport never exploded the
packfile and it still doesn't).  When used that way, choosing
object layout in packfile in such a way to ignore recency order
and cluster objects by their delta chain, which you are
advocating to reduce the transfer overhead, is a bad tradeoff.
Your packs will be kept in the form you chose for transport,
which is a layout that hurts the runtime performance.  And you
keep using that suboptimal packs number of times, getting hurt
every time.

> @@ -433,7 +434,7 @@ static unsigned long write_object(struct sha1file *f,
>  		}
>  		/* compress the data to store and put compressed length in datalen */
>  		memset(&stream, 0, sizeof(stream));
> -		deflateInit(&stream, pack_compression_level);
> +		deflateInit(&stream, size >= compression_min_size ? pack_compression_level : 0);
>  		maxsize = deflateBound(&stream, size);
>  		out = xmalloc(maxsize);
>  		/* Compress it */

I very much like the simplicity of the patch.  If such a simple
approach can give us a clear performance gain, I am all for it.

Benchmarks on different repositories need to back that up,
though.

  parent reply	other threads:[~2008-01-12  4:47 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 22:01 Decompression speed: zip vs lzo Marco Costalba
2008-01-09 22:55 ` Junio C Hamano
2008-01-09 23:23   ` Sam Vilain
2008-01-09 23:31     ` Johannes Schindelin
2008-01-10  1:02       ` Sam Vilain
2008-01-10  5:02         ` Sam Vilain
2008-01-10  9:16           ` Pierre Habouzit
2008-01-10 20:39             ` Nicolas Pitre
2008-01-10 21:01               ` Linus Torvalds
2008-01-10 21:30                 ` Nicolas Pitre
2008-01-11  8:57                   ` Pierre Habouzit
2008-01-10 21:45                 ` Sam Vilain
2008-01-10 22:03                   ` Linus Torvalds
2008-01-10 22:28                     ` Sam Vilain
2008-01-10 22:56                       ` Linus Torvalds
2008-01-11  1:01                         ` Sam Vilain
2008-01-11  2:10                           ` Linus Torvalds
2008-01-11  6:29                             ` Sam Vilain
2008-01-11  7:05                               ` Sam Vilain
2008-01-11 16:03                               ` Linus Torvalds
2008-01-12  1:52                                 ` Sam Vilain
2008-01-12  2:32                                   ` Nicolas Pitre
2008-01-12  3:06                                     ` Sam Vilain
2008-01-12 16:09                                       ` Nicolas Pitre
2008-01-12 16:44                                         ` Johannes Schindelin
2008-01-12  4:46                                   ` Junio C Hamano [this message]
2008-01-10 21:51               ` Marco Costalba
2008-01-10 22:01                 ` Sam Vilain
2008-01-10 22:18                 ` Nicolas Pitre
2008-01-11  9:45               ` Pierre Habouzit
2008-01-11 14:27                 ` Nicolas Pitre
2008-01-11 14:18               ` Morten Welinder
2008-01-10  3:41       ` Nicolas Pitre
2008-01-10  6:55         ` Marco Costalba
2008-01-10 11:45           ` Marco Costalba
2008-01-10 12:12             ` Johannes Schindelin
2008-01-10 12:18               ` Marco Costalba
2008-01-10 19:34           ` Dana How
2008-01-09 23:49     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vzlvby7li.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=madcoder@debian.org \
    --cc=mcostalba@gmail.com \
    --cc=nico@cam.org \
    --cc=sam@vilain.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).