From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH] git exproll: steps to tackle gc aggression Date: Sat, 10 Aug 2013 05:50:00 -0400 Message-ID: <20130810095000.GC2518@sigill.intra.peff.net> References: <7va9ksbqpl.fsf@alter.siamese.dyndns.org> <7v61vgazp5.fsf@alter.siamese.dyndns.org> <7vwqnw7z47.fsf@alter.siamese.dyndns.org> <20130809110000.GD18878@sigill.intra.peff.net> <20130809221615.GA7160@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Ramkumar Ramachandra , Junio C Hamano , Martin Fick , Git List To: Duy Nguyen X-From: git-owner@vger.kernel.org Sat Aug 10 11:50:39 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1V85oj-0002vM-T1 for gcvg-git-2@plane.gmane.org; Sat, 10 Aug 2013 11:50:38 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968523Ab3HJJuE (ORCPT ); Sat, 10 Aug 2013 05:50:04 -0400 Received: from cloud.peff.net ([50.56.180.127]:55378 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S968520Ab3HJJuD (ORCPT ); Sat, 10 Aug 2013 05:50:03 -0400 Received: (qmail 3727 invoked by uid 102); 10 Aug 2013 09:50:03 -0000 Received: from c-71-63-4-13.hsd1.va.comcast.net (HELO sigill.intra.peff.net) (71.63.4.13) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Sat, 10 Aug 2013 04:50:03 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Sat, 10 Aug 2013 05:50:00 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Sat, Aug 10, 2013 at 08:24:39AM +0700, Nguyen Thai Ngoc Duy wrote: > > the other end cannot use). You _might_ be able to get by with a kind of > > "two-level" hack: consider your main pack as "group A" and newly pushed > > packs as "group B". Allow storing thin deltas on disk from group B > > against group A, but never the reverse (nor within group B). That makes > > sure you don't have cycles, and it eliminates even more I/O than any > > repacking solution (because you never write the extra copy of Y to disk > > in the first place). But I can think of two problems: > [...] > > Some refinements on this idea > > - We could keep packs in group B ordered as the packs come in. The > new pack can depend on the previous ones. I think you could dispense with the two-level altogether and simply give a definite ordering to packs, whereby newer packs can only depend on older packs. Enforcing that with filesystem mtime feels a bit error-prone; I think you'd want to explicitly store a counter somewhere. > - A group index in addition to separate index for each pack would > solve linear search object lookup problem. Yeah. I do not even think it would be that much work. It is a pure optimization, so you can ignore issues like "what happens if I search for an object, but the pack it is supposed to be in went away?". The answer is "you fall back to a linear search through the packs", and assume it happens infrequently enough not to care. I'd wait to see how other proposed optimizations work out before doing a global index, though. The current wisdom is "don't have a ton of packs, for both the index issue and other reasons, like wasting space and on-the-fly deltas for fetches". If the "other reasons" go away, then a global index would make sense to solve the remaining issue. But if the solution for the other issues is to make it cheaper to repack so you can do it more often, then the index issue just goes away. -Peff