From: David Kastrup <dak@gnu.org>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Christian Jaeger <chrjae@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git gc --aggressive led to about 40 times slower "git log --raw"
Date: Tue, 18 Feb 2014 11:25:03 +0100 [thread overview]
Message-ID: <87ioscsoow.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <CACsJy8D9tws_gu6yWVdz3t+Vfg5-9iorptn4BLnTL3b+YWcHzQ@mail.gmail.com> (Duy Nguyen's message of "Tue, 18 Feb 2014 16:45:25 +0700")
Duy Nguyen <pclouds@gmail.com> writes:
> On Tue, Feb 18, 2014 at 3:55 PM, David Kastrup <dak@gnu.org> wrote:
>
>> I've seen the same with my ongoing work on git-blame with the current
>> Emacs Git mirror. Aggressive packing reduces the repository size to
>> about a quarter, but it blows up the system time (mainly I/O)
>> significantly, quite reducing the total benefits of my algorithmic
>> improvements there.
>
> Likely because --aggressive passes --depth=250 to pack-objects. Long
> delta chains could reduce pack size and increase I/O as well as zlib
> processing signficantly.
Increased zlib processing time is one thing, but if it _increases_ I/O,
then it would seem there is a serious impedance mismatch between the
compression scheme and the code relying on it, leading to repeated reads
of blocks only needed for reconstructing dynamic compression
dictionaries.
Compression should reduce rather than increase the total amount of
reads. So it would seem that either better caching and/or smaller
independent block sizes and/or strategies for sorting the delta chain to
make its resolution require mostly linear reads, and then make sure to
do this in a manner that does not reinitialize the decompression for
accessing each delta that happens to be more or less "in sequence".
Of course, this is assuming that the additional time is spent
uncompressing data rather than navigating directories.
It's actually conceivable that there is quite a bit of potential to get
better performance from unchanged readers by packing stuff in a
different order while still using the same delta chain depth.
--
David Kastrup
next prev parent reply other threads:[~2014-02-18 10:25 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-18 7:25 git gc --aggressive led to about 40 times slower "git log --raw" Christian Jaeger
2014-02-18 8:55 ` David Kastrup
2014-02-18 9:45 ` Duy Nguyen
2014-02-18 10:25 ` David Kastrup [this message]
2014-02-18 15:59 ` Jonathan Nieder
2014-02-18 20:59 ` Junio C Hamano
2014-02-18 22:46 ` Duy Nguyen
2014-02-19 0:10 ` Junio C Hamano
2014-02-19 0:33 ` Duy Nguyen
2014-02-19 8:38 ` Philippe Vaucher
2014-02-19 9:01 ` David Kastrup
2014-02-19 10:24 ` Duy Nguyen
2014-02-19 10:14 ` Duy Nguyen
2014-02-20 4:09 ` Christian Jaeger
2014-02-20 16:48 ` David Kastrup
2014-02-20 17:06 ` David Kastrup
2014-02-20 18:07 ` David Kastrup
2014-02-19 18:59 ` Junio C Hamano
2014-02-20 23:35 ` Duy Nguyen
2014-02-21 0:32 ` Christian Jaeger
2014-02-21 17:36 ` Junio C Hamano
2014-02-21 5:09 ` Duy Nguyen
2014-02-21 17:47 ` Junio C Hamano
2014-02-24 9:27 ` Philippe Vaucher
2014-02-22 0:36 ` Duy Nguyen
2014-02-22 6:20 ` David Kastrup
2014-02-22 8:53 ` David Kastrup
2014-02-22 9:14 ` Duy Nguyen
2014-02-22 13:00 ` Duy Nguyen
2014-02-22 9:57 ` Andreas Schwab
2014-02-18 16:43 ` Christian Jaeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ioscsoow.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=chrjae@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).