From: Nicolas Pitre <nico@fluxnic.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>,
Jay Soffian <jaysoffian@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Shawn Pearce <spearce@spearce.org>
Subject: Re: gc --aggressive
Date: Tue, 01 May 2012 13:17:03 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.02.1205011259200.21030@xanadu.home> (raw)
In-Reply-To: <20120501162806.GA15614@sigill.intra.peff.net>
On Tue, 1 May 2012, Jeff King wrote:
> On Sun, Apr 29, 2012 at 09:53:31AM -0400, Nicolas Pitre wrote:
>
> > But my remark was related to the fact that you need to double the
> > affected resources to gain marginal improvements at some point. This is
> > true about computing hardware too: eventually you need way more gates
> > and spend much more $$$ to gain some performance, and the added
> > performance is never linear with the spending.
>
> Right, I agree with that. The trick is just finding the right spot on
> that curve for each repo to maximize the reward/effort ratio.
Absolutely, at least for the default settings. However this is not what
--aggressive is meant to be.
> > > 1. Should we bump our default window size? The numbers above show that
> > > typical repos would benefit from jumping to 20 or even 40.
> >
> > I think this might be a good indication that the number of objects is a
> > bad metric to size the window, as I mentioned previously.
> >
> > Given that you have the test repos already, could you re-run it with
> > --window=1000 and play with --window-memory instead? I would be curious
> > to see if this provides more predictable results.
>
> It doesn't help. The git.git repo does well with about a 1m window
> limit. linux-2.6 is somewhere between 1m and 2m. But the phpmyadmin repo
> wants more like 16m. So it runs into the same issue as using object
> counts.
>
> But it's much, much worse than that. Here are the actual numbers (same
> format as before; left-hand column is either window size (if no unit) or
> window-memory limit (if k/m unit), followed by resulting pack size, its
> percentage of baseline --window=10 pack, the user CPU time and finally
> its percentage of the baseline):
> [...]
Ouch! Well... so much for good theory. I'm still really surprised and
disappointed as I didn't expect such damage at all.
However, this is possibly a good baseline to determine a default value
for window-memory though. Given your number, we clearly see that good
packing can be achieved with relatively little memory and therefore it
might be a good idea not to leave this parameter unbounded by default in
order to catch potential pathological cases. Maybe 64M would be a good
default value? Having a repack process eating up more than 16GB of RAM
because its RAM usage is unbounded is certainly not nice.
> > Maybe we could look at the size reduction within the delta search loop.
> > If the reduction quickly diminishes as tested objects are further away
> > from the target one then the window doesn't have to be very large,
> > whereas if the reduction remains more or less constant then it might be
> > worth searching further. That could be used to dynamically size the
> > window at run time.
>
> I really like the idea of dynamically sizing the window based on what we
> find. If it works. I don't think there's any reason you couldn't have 50
> absolutely terrible delta candidates followed by one really amazing
> delta candidate. But maybe in practice the window tends to get
> progressively worse due to the heuristics, and outliers are unlikely. I
> guess we'd have to experiment.
Yes. The idea is to continue searching if results are not progressively
becoming worse fast enough. Coming up with a good way to infer that is
far from obvious though.
Nicolas
next prev parent reply other threads:[~2012-05-01 17:17 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-17 16:16 gc --aggressive Jay Soffian
2012-04-17 17:53 ` Jay Soffian
2012-04-17 20:52 ` Matthieu Moy
2012-04-17 21:58 ` Jeff King
2012-04-28 12:25 ` Jeff King
2012-04-28 17:11 ` Nicolas Pitre
2012-04-29 11:34 ` Jeff King
2012-04-29 13:53 ` Nicolas Pitre
2012-05-01 16:28 ` Jeff King
2012-05-01 17:16 ` Jeff King
2012-05-01 17:59 ` Nicolas Pitre
2012-05-01 18:47 ` Junio C Hamano
2012-05-01 19:22 ` Nicolas Pitre
2012-05-01 20:01 ` Jeff King
2012-05-01 19:35 ` Jeff King
2012-05-01 20:02 ` Nicolas Pitre
2012-05-01 17:17 ` Nicolas Pitre [this message]
2012-05-01 17:22 ` Jeff King
2012-05-01 17:47 ` Nicolas Pitre
2012-04-28 16:56 ` Nicolas Pitre
2012-04-17 22:08 ` Jeff King
2012-04-17 22:17 ` Junio C Hamano
2012-04-17 22:18 ` Jeff King
2012-04-17 22:34 ` Junio C Hamano
2012-04-28 16:42 ` Nicolas Pitre
2012-04-18 8:49 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.02.1205011259200.21030@xanadu.home \
--to=nico@fluxnic.net \
--cc=Matthieu.Moy@grenoble-inp.fr \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jaysoffian@gmail.com \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).