git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Philippe Vaucher <philippe.vaucher@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Christian Jaeger <chrjae@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: git gc --aggressive led to about 40 times slower "git log --raw"
Date: Thu, 20 Feb 2014 17:48:21 +0100	[thread overview]
Message-ID: <8738jdspbe.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <CACsJy8DsC9X=13iEpONcT6bw6qTw_O586_vZ2W_3O42ajEPF4A@mail.gmail.com> (Duy Nguyen's message of "Wed, 19 Feb 2014 17:14:46 +0700")

Duy Nguyen <pclouds@gmail.com> writes:

> I can think of two improvements we could make, either increase cache
> size dynamically (within limits) or make it configurable. If we have N
> entries in worktree (both trees and blobs) and depth M, then we might
> need to cache N*M objects for it to be effective. Christian, if you
> want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
> rebuild.

Well, my optimized "git-blame" code is considerably hit by an
aggressively packed Emacs repository so I took a look at it with the
MAX_DELTA_CACHE value set to the default 256, and then 512, 1024, 2048.

Here are the results:

dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m17.496s
user	0m30.552s
sys	0m46.496s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m13.888s
user	0m30.060s
sys	0m43.420s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m16.415s
user	0m31.436s
sys	0m44.564s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m24.732s
user	0m34.416s
sys	0m49.808s

So using a value of 512 helps a bit (7% or so), but further increases
already cause a hit.  My machine has 4G of memory (32bit x86), so it is
unlikely that memory is running out.  I have no idea why this would be
so: either memory locality plays a role here, or the cache for some
reason gets reinitialized or scanned/copied/accessed as a whole
repeatedly, defeating the idea of a cache.  Or the access pattern are
such that it's entirely useless as a cache even at this size.

Trying with 16384:
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	2m8.000s
user	0m54.968s
sys	1m12.624s

And memory consumption did not exceed about 200m all the while, so is
far lower than what would have been available.

Something's _really_ fishy about that cache behavior.  Note that the
_system_ time goes up considerably, not just user time.  Since the packs
are zlib-packed, it's reasonable that more I/O time is also associated
with more user time and it is well possible that the user time increase
is entirely explainable by the larger amount of compressed data to
access.

But this stinks.  I doubt that the additional time is spent in memory
allocation: most of that would register only as user time.  And the
total allocated memory is not large enough that one can explain this
away with fewer available disk buffers for the kernel: the aggressively
packed repo takes about 300m so it would fine into memory together with
the git process.

-- 
David Kastrup

  parent reply	other threads:[~2014-02-20 16:48 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18  7:25 git gc --aggressive led to about 40 times slower "git log --raw" Christian Jaeger
2014-02-18  8:55 ` David Kastrup
2014-02-18  9:45   ` Duy Nguyen
2014-02-18 10:25     ` David Kastrup
2014-02-18 15:59       ` Jonathan Nieder
2014-02-18 20:59         ` Junio C Hamano
2014-02-18 22:46           ` Duy Nguyen
2014-02-19  0:10             ` Junio C Hamano
2014-02-19  0:33               ` Duy Nguyen
2014-02-19  8:38                 ` Philippe Vaucher
2014-02-19  9:01                   ` David Kastrup
2014-02-19 10:24                     ` Duy Nguyen
2014-02-19 10:14                   ` Duy Nguyen
2014-02-20  4:09                     ` Christian Jaeger
2014-02-20 16:48                     ` David Kastrup [this message]
2014-02-20 17:06                       ` David Kastrup
2014-02-20 18:07                         ` David Kastrup
2014-02-19 18:59                   ` Junio C Hamano
2014-02-20 23:35                     ` Duy Nguyen
2014-02-21  0:32                       ` Christian Jaeger
2014-02-21 17:36                         ` Junio C Hamano
2014-02-21  5:09                       ` Duy Nguyen
2014-02-21 17:47                       ` Junio C Hamano
2014-02-24  9:27                         ` Philippe Vaucher
2014-02-22  0:36           ` Duy Nguyen
2014-02-22  6:20             ` David Kastrup
2014-02-22  8:53               ` David Kastrup
2014-02-22  9:14                 ` Duy Nguyen
2014-02-22 13:00                   ` Duy Nguyen
2014-02-22  9:57               ` Andreas Schwab
2014-02-18 16:43     ` Christian Jaeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8738jdspbe.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=chrjae@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=philippe.vaucher@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).