git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Jeff King <peff@peff.net>
Cc: Michael Poole <mdpoole@troilus.org>, Miles Bader <miles@gnu.org>,
	Michael Witten <mfwitten@gmail.com>,
	Frans Pop <elendil@planet.nl>,
	git@vger.kernel.org
Subject: Re: 'git gc --aggressive' effectively unusable
Date: Mon, 05 Apr 2010 17:07:32 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.1004051615070.7232@xanadu.home> (raw)
In-Reply-To: <20100404214944.GA15104@coredump.intra.peff.net>

On Sun, 4 Apr 2010, Jeff King wrote:

> On Sun, Apr 04, 2010 at 04:38:50PM -0400, Jeff King wrote:
> 
> > I packed Frans' sample kernel repo with "git gc --aggressive" last
> > night. It did finish after about 9 hours. I didn't take memory usage
> > measurements, but here's what time said:
> > 
> >   real    535m38.898s
> >   user    216m46.437s
> >   sys     0m24.186s
> > 
> > That's 3.6 hours of CPU time over almost 9 hours (on a dual-core
> > machine). The non-agressive pack was about 680M, and the result was
> > 480M. The machine has 2G of RAM, and not much else running. So I would
> > really not expect there to be much disk I/O required, but clearly we
> > were waiting quite a bit.
> > 
> > I'll try tweaking a few of the pack memory limits and try again.
> 
> Hmm, this may be relevant:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/67791/focus=94797
> 
> In my experiments, memory usage is increasing but valgrind doesn't
> leaks. So perhaps it is fragmentation in the memory allocator.

To verify this, simply try with pack.threads = 1.  That should help the 
memory allocator not to fragment memory allocation across threads 
randomly.

Also, going multithreaded _may_ be faster only if you can afford the 
increased memory usage.  Especially with gc --aggressive, each thread is 
adding its own share of memory usage in the delta window.

First thing to try for the biggest possible improvement is 
pack.threads=1.  On a quad core machine this means repacking 4 times 
slower, but this is certainly much faster than 100 times slower when the 
system starts swapping. That might even make the resulting pack a tad 
tighter due to delta windows not being fragmented across different 
threads.

If that is not enough, then try:

	pack.deltaCacheSize = 1
	core.packedGitWindowSize = 16m
	core.packedGitLimit = 128m

This should reduce Git's memory usage while making it slower without 
affecting the packing outcome.  Again "slower" could mean "much faster" 
if by reducing memory usage then swapping is completely avoided.

If that still doesn't help much, then the next tweaks will affect the 
packing result:

	pack.windowMemory = 256m

Here 256m is arbitrary and must be guessed from the size of the objects 
being packed.  The idea is to let smallish objects completely fill the 
search window (it has 250 entries by default with --aggressive) while 
not letting that many huge objects completely eat up all memory.  If 
there is still swapping going on then you can try 64m instead.  That 
means that if you have a large set of 1MB objects then the delta search 
window will be scaled down to less than 64 entries in that case.  This 
is why packing might be less optimal as there are fewer delta 
combinations being considered.

If this still doesn't prevent swapping then you should really consider 
installing more RAM.  There are fundamental object accounting structures 
that can hardly be shrunk such as struct object_entry in 
builtin/pack-objects.c, and one instance of such structure is needed for 
each object.  On a 64-bit machine this structure occupies 120 bytes, 
meaning 2M objects requires 240MB of RAM just for that.  The data set 
also has to fit in the file cache to avoid IO trashing.  So if your 
repository is larger than the available RAM then some trashing is almost 
unavoidable.  Sometimes a badly packed repository may require 2GB of 
disk space in the .git directory alone while the fully packed version is 
only a few hundred megabytes.  Such repositories may need to be repacked 
on a big machine first, before machines with less RAM are able to handle 
it afterwards.

Hope this helps.


Nicolas

  reply	other threads:[~2010-04-05 21:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-02 22:05 'git gc --aggressive' effectively unusable Frans Pop
2010-04-02 22:12 ` Frans Pop
2010-04-03 21:16 ` Frans Pop
2010-04-03 21:33 ` Michael Witten
2010-04-03 21:42   ` Michael Witten
2010-04-03 23:23   ` Frans Pop
2010-04-03 23:42     ` Michael Witten
2010-04-04  0:14     ` Miles Bader
2010-04-04 14:50       ` Michael Poole
2010-04-04 20:38         ` Jeff King
2010-04-04 21:49           ` Jeff King
2010-04-05 21:07             ` Nicolas Pitre [this message]
2010-04-04  4:27     ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1004051615070.7232@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=elendil@planet.nl \
    --cc=git@vger.kernel.org \
    --cc=mdpoole@troilus.org \
    --cc=mfwitten@gmail.com \
    --cc=miles@gnu.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).