git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>,
	Jay Soffian <jaysoffian@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Shawn Pearce <spearce@spearce.org>
Subject: Re: gc --aggressive
Date: Tue, 01 May 2012 13:17:03 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.02.1205011259200.21030@xanadu.home> (raw)
In-Reply-To: <20120501162806.GA15614@sigill.intra.peff.net>

On Tue, 1 May 2012, Jeff King wrote:

> On Sun, Apr 29, 2012 at 09:53:31AM -0400, Nicolas Pitre wrote:
> 
> > But my remark was related to the fact that you need to double the 
> > affected resources to gain marginal improvements at some point.  This is 
> > true about computing hardware too: eventually you need way more gates 
> > and spend much more $$$ to gain some performance, and the added 
> > performance is never linear with the spending.
> 
> Right, I agree with that. The trick is just finding the right spot on
> that curve for each repo to maximize the reward/effort ratio.

Absolutely, at least for the default settings.  However this is not what 
--aggressive is meant to be.

> > >   1. Should we bump our default window size? The numbers above show that
> > >      typical repos would benefit from jumping to 20 or even 40.
> > 
> > I think this might be a good indication that the number of objects is a 
> > bad metric to size the window, as I mentioned previously.
> > 
> > Given that you have the test repos already, could you re-run it with 
> > --window=1000 and play with --window-memory instead?  I would be curious 
> > to see if this provides more predictable results.
> 
> It doesn't help. The git.git repo does well with about a 1m window
> limit. linux-2.6 is somewhere between 1m and 2m. But the phpmyadmin repo
> wants more like 16m. So it runs into the same issue as using object
> counts.
> 
> But it's much, much worse than that. Here are the actual numbers (same
> format as before; left-hand column is either window size (if no unit) or
> window-memory limit (if k/m unit), followed by resulting pack size, its
> percentage of baseline --window=10 pack, the user CPU time and finally
> its percentage of the baseline):
> [...]

Ouch!  Well... so much for good theory.  I'm still really surprised and 
disappointed as I didn't expect such damage at all.

However, this is possibly a good baseline to determine a default value 
for window-memory though.  Given your number, we clearly see that good 
packing can be achieved with relatively little memory and therefore it 
might be a good idea not to leave this parameter unbounded by default in 
order to catch potential pathological cases.  Maybe 64M would be a good 
default value?  Having a repack process eating up more than 16GB of RAM 
because its RAM usage is unbounded is certainly not nice.

> > Maybe we could look at the size reduction within the delta search loop.  
> > If the reduction quickly diminishes as tested objects are further away 
> > from the target one then the window doesn't have to be very large, 
> > whereas if the reduction remains more or less constant then it might be 
> > worth searching further.  That could be used to dynamically size the 
> > window at run time.
> 
> I really like the idea of dynamically sizing the window based on what we
> find. If it works. I don't think there's any reason you couldn't have 50
> absolutely terrible delta candidates followed by one really amazing
> delta candidate. But maybe in practice the window tends to get
> progressively worse due to the heuristics, and outliers are unlikely. I
> guess we'd have to experiment.

Yes.  The idea is to continue searching if results are not progressively 
becoming worse fast enough.  Coming up with a good way to infer that is 
far from obvious though.


Nicolas

  parent reply	other threads:[~2012-05-01 17:17 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-17 16:16 gc --aggressive Jay Soffian
2012-04-17 17:53 ` Jay Soffian
2012-04-17 20:52   ` Matthieu Moy
2012-04-17 21:58     ` Jeff King
2012-04-28 12:25     ` Jeff King
2012-04-28 17:11       ` Nicolas Pitre
2012-04-29 11:34         ` Jeff King
2012-04-29 13:53           ` Nicolas Pitre
2012-05-01 16:28             ` Jeff King
2012-05-01 17:16               ` Jeff King
2012-05-01 17:59                 ` Nicolas Pitre
2012-05-01 18:47                   ` Junio C Hamano
2012-05-01 19:22                     ` Nicolas Pitre
2012-05-01 20:01                     ` Jeff King
2012-05-01 19:35                   ` Jeff King
2012-05-01 20:02                     ` Nicolas Pitre
2012-05-01 17:17               ` Nicolas Pitre [this message]
2012-05-01 17:22                 ` Jeff King
2012-05-01 17:47                   ` Nicolas Pitre
2012-04-28 16:56   ` Nicolas Pitre
2012-04-17 22:08 ` Jeff King
2012-04-17 22:17   ` Junio C Hamano
2012-04-17 22:18     ` Jeff King
2012-04-17 22:34       ` Junio C Hamano
2012-04-28 16:42         ` Nicolas Pitre
2012-04-18  8:49       ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1205011259200.21030@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jaysoffian@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).