git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joachim Berdal Haga <c.j.b.haga@fys.uio.no>
To: Jeff King <peff@peff.net>
Cc: Joachim B Haga <cjhaga@student.matnat.uio.no>, git@vger.kernel.org
Subject: Re: Compression speed for large files
Date: Tue, 04 Jul 2006 00:25:37 +0200	[thread overview]
Message-ID: <44A99961.8090504@fys.uio.no> (raw)
In-Reply-To: <20060703214503.GA3897@coredump.intra.peff.net>

Jeff King wrote:
> On Mon, Jul 03, 2006 at 11:13:34AM +0000, Joachim B Haga wrote:
> 
>> often binary. In git, committing of large files is very slow; I have
>> tested with a 45MB file, which takes about 1 minute to check in (on an
>> intel core-duo 2GHz).
> 
> I know this has already been somewhat solved, but I found your numbers
> curiously high. I work quite a bit with git and large files and I
> haven't noticed this slowdown. Can you be more specific about your load?
> Are you sure it is zlib?

Quite sure: at least to the extent that it is fixed by lowering the
compression level. But the wording was inexact: it's during object
creation, which happens at initial "git add" and then later during "git
commit".

But...

> y 1.8Ghz Athlon, compressing 45MB of zeros into 20K takes about 2s.
> Compressing 45MB of random data into a 45MB object takes 6.3s. In either
> case, the commit takes only about 0.5s (since cogito stores the object
> during the cg-add).
> 
> Is there some specific file pattern which is slow to compress? 

yes, it seems so. At least the effect is much more pronounced for my
files than for random/null data. "My" files are in this context generated
data files, binary or ascii.

Here's a test with "time gzip -[169] -c file >/dev/null". Random data
from /dev/urandom, kernel headers are concatenation of *.h in kernel
sources. All times in seconds, on my puny home computer (1GHz Via Nehemiah)

       random (23MB)  data (23MB)   headers (44MB)
-9     10.2           72.5          38.5
-6     10.2           13.5          12.9
-1      9.9            4.1           7.0

So... data dependent, yes. But it hits even for normal source code.

(Btw; the default (-6) seems to be less data dependent than the other
values. Maybe that's on purpose.)

If you want to look at a highly-variable dataset (the one above), try
http://lupus.ig3.net/SIMULATION.dx.gz (5MB, slow server), but that's just
an example, I see the same variability for example also on binary data files.

-j.

  reply	other threads:[~2006-07-03 22:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-03 11:13 Compression speed for large files Joachim B Haga
2006-07-03 12:03 ` Alex Riesen
2006-07-03 12:42   ` Elrond
2006-07-03 13:44     ` Joachim B Haga
2006-07-03 13:32   ` Joachim Berdal Haga
     [not found]     ` <Pine.LN X.4.64.0607031030150.1213@localhost.localdomain>
2006-07-03 14:33     ` Nicolas Pitre
2006-07-03 14:54       ` Yakov Lerner
2006-07-03 15:17         ` Johannes Schindelin
2006-07-03 16:31       ` Linus Torvalds
2006-07-03 18:59         ` [PATCH] Make zlib compression level configurable, and change default Joachim B Haga
2006-07-03 19:33           ` Linus Torvalds
2006-07-03 19:50             ` Linus Torvalds
2006-07-03 20:11             ` Joachim B Haga
2006-07-03 19:02         ` [PATCH] Use configurable zlib compression level everywhere Joachim B Haga
2006-07-03 19:43           ` Junio C Hamano
2006-07-07 21:53             ` David Lang
2006-07-08  2:10               ` Johannes Schindelin
2006-07-03 21:45 ` Compression speed for large files Jeff King
2006-07-03 22:25   ` Joachim Berdal Haga [this message]
2006-07-03 23:02     ` Linus Torvalds
2006-07-04  5:42       ` Joachim Berdal Haga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44A99961.8090504@fys.uio.no \
    --to=c.j.b.haga@fys.uio.no \
    --cc=cjhaga@student.matnat.uio.no \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).