From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Pitre Subject: Re: gc --aggressive Date: Tue, 01 May 2012 13:17:03 -0400 (EDT) Message-ID: References: <20120428122533.GA12098@sigill.intra.peff.net> <20120429113431.GA24254@sigill.intra.peff.net> <20120501162806.GA15614@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Cc: git@vger.kernel.org, Matthieu Moy , Jay Soffian , Junio C Hamano , Shawn Pearce To: Jeff King X-From: git-owner@vger.kernel.org Tue May 01 19:17:14 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SPGhM-0006Gv-Bw for gcvg-git-2@plane.gmane.org; Tue, 01 May 2012 19:17:12 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757640Ab2EARRH (ORCPT ); Tue, 1 May 2012 13:17:07 -0400 Received: from relais.videotron.ca ([24.201.245.36]:28103 "EHLO relais.videotron.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756225Ab2EARRG (ORCPT ); Tue, 1 May 2012 13:17:06 -0400 Received: from xanadu.home ([66.130.28.92]) by VL-VM-MR006.ip.videotron.ca (Oracle Communications Messaging Exchange Server 7u4-22.01 64bit (built Apr 21 2011)) with ESMTP id <0M3C0063QTCFNZ81@VL-VM-MR006.ip.videotron.ca> for git@vger.kernel.org; Tue, 01 May 2012 13:17:03 -0400 (EDT) In-reply-to: <20120501162806.GA15614@sigill.intra.peff.net> User-Agent: Alpine 2.02 (LFD 1266 2009-07-14) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, 1 May 2012, Jeff King wrote: > On Sun, Apr 29, 2012 at 09:53:31AM -0400, Nicolas Pitre wrote: > > > But my remark was related to the fact that you need to double the > > affected resources to gain marginal improvements at some point. This is > > true about computing hardware too: eventually you need way more gates > > and spend much more $$$ to gain some performance, and the added > > performance is never linear with the spending. > > Right, I agree with that. The trick is just finding the right spot on > that curve for each repo to maximize the reward/effort ratio. Absolutely, at least for the default settings. However this is not what --aggressive is meant to be. > > > 1. Should we bump our default window size? The numbers above show that > > > typical repos would benefit from jumping to 20 or even 40. > > > > I think this might be a good indication that the number of objects is a > > bad metric to size the window, as I mentioned previously. > > > > Given that you have the test repos already, could you re-run it with > > --window=1000 and play with --window-memory instead? I would be curious > > to see if this provides more predictable results. > > It doesn't help. The git.git repo does well with about a 1m window > limit. linux-2.6 is somewhere between 1m and 2m. But the phpmyadmin repo > wants more like 16m. So it runs into the same issue as using object > counts. > > But it's much, much worse than that. Here are the actual numbers (same > format as before; left-hand column is either window size (if no unit) or > window-memory limit (if k/m unit), followed by resulting pack size, its > percentage of baseline --window=10 pack, the user CPU time and finally > its percentage of the baseline): > [...] Ouch! Well... so much for good theory. I'm still really surprised and disappointed as I didn't expect such damage at all. However, this is possibly a good baseline to determine a default value for window-memory though. Given your number, we clearly see that good packing can be achieved with relatively little memory and therefore it might be a good idea not to leave this parameter unbounded by default in order to catch potential pathological cases. Maybe 64M would be a good default value? Having a repack process eating up more than 16GB of RAM because its RAM usage is unbounded is certainly not nice. > > Maybe we could look at the size reduction within the delta search loop. > > If the reduction quickly diminishes as tested objects are further away > > from the target one then the window doesn't have to be very large, > > whereas if the reduction remains more or less constant then it might be > > worth searching further. That could be used to dynamically size the > > window at run time. > > I really like the idea of dynamically sizing the window based on what we > find. If it works. I don't think there's any reason you couldn't have 50 > absolutely terrible delta candidates followed by one really amazing > delta candidate. But maybe in practice the window tends to get > progressively worse due to the heuristics, and outliers are unlikely. I > guess we'd have to experiment. Yes. The idea is to continue searching if results are not progressively becoming worse fast enough. Coming up with a good way to infer that is far from obvious though. Nicolas