From: Joachim Berdal Haga <c.j.b.haga@fys.uio.no>
To: Jeff King <peff@peff.net>
Cc: Joachim B Haga <cjhaga@student.matnat.uio.no>, git@vger.kernel.org
Subject: Re: Compression speed for large files
Date: Tue, 04 Jul 2006 00:25:37 +0200 [thread overview]
Message-ID: <44A99961.8090504@fys.uio.no> (raw)
In-Reply-To: <20060703214503.GA3897@coredump.intra.peff.net>
Jeff King wrote:
> On Mon, Jul 03, 2006 at 11:13:34AM +0000, Joachim B Haga wrote:
>
>> often binary. In git, committing of large files is very slow; I have
>> tested with a 45MB file, which takes about 1 minute to check in (on an
>> intel core-duo 2GHz).
>
> I know this has already been somewhat solved, but I found your numbers
> curiously high. I work quite a bit with git and large files and I
> haven't noticed this slowdown. Can you be more specific about your load?
> Are you sure it is zlib?
Quite sure: at least to the extent that it is fixed by lowering the
compression level. But the wording was inexact: it's during object
creation, which happens at initial "git add" and then later during "git
commit".
But...
> y 1.8Ghz Athlon, compressing 45MB of zeros into 20K takes about 2s.
> Compressing 45MB of random data into a 45MB object takes 6.3s. In either
> case, the commit takes only about 0.5s (since cogito stores the object
> during the cg-add).
>
> Is there some specific file pattern which is slow to compress?
yes, it seems so. At least the effect is much more pronounced for my
files than for random/null data. "My" files are in this context generated
data files, binary or ascii.
Here's a test with "time gzip -[169] -c file >/dev/null". Random data
from /dev/urandom, kernel headers are concatenation of *.h in kernel
sources. All times in seconds, on my puny home computer (1GHz Via Nehemiah)
random (23MB) data (23MB) headers (44MB)
-9 10.2 72.5 38.5
-6 10.2 13.5 12.9
-1 9.9 4.1 7.0
So... data dependent, yes. But it hits even for normal source code.
(Btw; the default (-6) seems to be less data dependent than the other
values. Maybe that's on purpose.)
If you want to look at a highly-variable dataset (the one above), try
http://lupus.ig3.net/SIMULATION.dx.gz (5MB, slow server), but that's just
an example, I see the same variability for example also on binary data files.
-j.
next prev parent reply other threads:[~2006-07-03 22:25 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-03 11:13 Compression speed for large files Joachim B Haga
2006-07-03 12:03 ` Alex Riesen
2006-07-03 12:42 ` Elrond
2006-07-03 13:44 ` Joachim B Haga
2006-07-03 13:32 ` Joachim Berdal Haga
[not found] ` <Pine.LN X.4.64.0607031030150.1213@localhost.localdomain>
2006-07-03 14:33 ` Nicolas Pitre
2006-07-03 14:54 ` Yakov Lerner
2006-07-03 15:17 ` Johannes Schindelin
2006-07-03 16:31 ` Linus Torvalds
2006-07-03 18:59 ` [PATCH] Make zlib compression level configurable, and change default Joachim B Haga
2006-07-03 19:33 ` Linus Torvalds
2006-07-03 19:50 ` Linus Torvalds
2006-07-03 20:11 ` Joachim B Haga
2006-07-03 19:02 ` [PATCH] Use configurable zlib compression level everywhere Joachim B Haga
2006-07-03 19:43 ` Junio C Hamano
2006-07-07 21:53 ` David Lang
2006-07-08 2:10 ` Johannes Schindelin
2006-07-03 21:45 ` Compression speed for large files Jeff King
2006-07-03 22:25 ` Joachim Berdal Haga [this message]
2006-07-03 23:02 ` Linus Torvalds
2006-07-04 5:42 ` Joachim Berdal Haga
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44A99961.8090504@fys.uio.no \
--to=c.j.b.haga@fys.uio.no \
--cc=cjhaga@student.matnat.uio.no \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).