From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Pitre Subject: Re: [RFC] super indexes to span multiple packfiles Date: Tue, 29 May 2007 12:19:13 -0400 (EDT) Message-ID: References: <20070529071622.GA8905@spearce.org> <9e4733910705290905m66dd3081ubda9b92a707fc903@mail.gmail.com> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=us-ascii Content-Transfer-Encoding: 7BIT Cc: "Shawn O. Pearce" , git@vger.kernel.org, Dana How To: Jon Smirl X-From: git-owner@vger.kernel.org Tue May 29 18:19:26 2007 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ht4Pu-0000kp-8e for gcvg-git@gmane.org; Tue, 29 May 2007 18:19:26 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751120AbXE2QTR (ORCPT ); Tue, 29 May 2007 12:19:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752053AbXE2QTR (ORCPT ); Tue, 29 May 2007 12:19:17 -0400 Received: from relais.videotron.ca ([24.201.245.36]:27956 "EHLO relais.videotron.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751120AbXE2QTQ (ORCPT ); Tue, 29 May 2007 12:19:16 -0400 Received: from xanadu.home ([74.56.106.175]) by VL-MO-MR004.ip.videotron.ca (Sun Java System Messaging Server 6.2-2.05 (built Apr 28 2005)) with ESMTP id <0JIT00F179C2GMA0@VL-MO-MR004.ip.videotron.ca> for git@vger.kernel.org; Tue, 29 May 2007 12:19:14 -0400 (EDT) In-reply-to: <9e4733910705290905m66dd3081ubda9b92a707fc903@mail.gmail.com> X-X-Sender: nico@xanadu.home Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, 29 May 2007, Jon Smirl wrote: > Object's are not accessed in random order with git. Once an object > reference hits a pack file it is very likely that following references > will hit the same pack file. That's because you always find object > SHA's by following the chains. > > So first place to look for an object is the same place the previous > object was found. If it isn't there order the search of the pack files > by creation data (just a heuristic). Make this list a circle and start > the search in the pack where the previous object was found. This can > all be done with the existing indexes. > > I haven't been reading all of the messages on this subject, but is > this strategy enough to eliminate the need for a super index? I think it could. Personally I'm not a big fan of the super index notion. It needs extra maintenance to keep in synch, and when it is not in synch it requires extra work at run time to fall back to traditional lookup. And Shawn's testing didn't provide significant performance gains either. But a simple heuristic like the presumption that the next object is likely to be in the same pack as the previous is the kind of thing that could provide significant improvements with really little effort. Nicolas