From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Dana How" Subject: Re: [PATCH 1/3] Lazily open pack index files on demand Date: Sat, 26 May 2007 21:40:42 -0700 Message-ID: <56b7f5510705262140rea5e1e5r49bdd5e99c466daa@mail.gmail.com> References: <20070526052419.GA11957@spearce.org> <7vabvsm1h8.fsf@assigned-by-dhcp.cox.net> <56b7f5510705261031o311b89bapd730374cbc063931@mail.gmail.com> <20070527033429.GY28023@spearce.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Junio C Hamano" , git@vger.kernel.org, danahow@gmail.com To: "Shawn O. Pearce" X-From: git-owner@vger.kernel.org Sun May 27 06:40:59 2007 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1HsAYo-00040a-SQ for gcvg-git@gmane.org; Sun, 27 May 2007 06:40:55 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750897AbXE0Ekp (ORCPT ); Sun, 27 May 2007 00:40:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750964AbXE0Eko (ORCPT ); Sun, 27 May 2007 00:40:44 -0400 Received: from ug-out-1314.google.com ([66.249.92.170]:64400 "EHLO ug-out-1314.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750897AbXE0Eko (ORCPT ); Sun, 27 May 2007 00:40:44 -0400 Received: by ug-out-1314.google.com with SMTP id j3so1401939ugf for ; Sat, 26 May 2007 21:40:42 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=DfV2TqCwo9nkerGl9ZBvKWsDgG39D4qGuNj/qEbrRXNOeA9B6Hrdr8w+9ZpsjLCxdz5XbiuvB3klhiqQiAk/Hy2cQlxO9Np8TnQ5v58zBE4LL/aHvYzVL84nRISMtPFFK7QMJuATj1fvjvpldgjeUVEg4noGgK4D2OU8aLC/SxE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=owhDeqPKKW6kH/FvgpbUJX9czzn7aTVtwI4Y/LmLjtRoSSoyAFNURFomw02VrP6aufriFPP9+E3nSRVjgoxvyTcOZKAMgc7Wpc07fvfsI1uvBsYWaw0yDU07UvgPGsNYt+BYuyTjnvs78Ubb8NdEmoRp/IBPZDRd/DAnqCljiso= Received: by 10.78.147.6 with SMTP id u6mr1338097hud.1180240842590; Sat, 26 May 2007 21:40:42 -0700 (PDT) Received: by 10.78.129.3 with HTTP; Sat, 26 May 2007 21:40:42 -0700 (PDT) In-Reply-To: <20070527033429.GY28023@spearce.org> Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: On 5/26/07, Shawn O. Pearce wrote: > Dana How wrote: > > Shawn: When I first saw the index-loading code, my first > > thought was that all the index tables should be > > merged (easy since sorted) so callers only need to do one search. > > Yes; in fact this has been raised on the list before. The general > idea was to create some sort of "super index" that had a list of > all objects and which packfile they could be found in. This way the > running process doesn't have to search multiple indexes, and the > process doesn't have to be responsible for the merging itself. > > See the thing is, if you read all of every .idx file on a simple > `git-log` operation you've already lost. The number of trees and > blobs tends to far outweigh the number of commits and they really > outweigh the number of commits the average user looks at in a > `git-log` session before they abort their pager. So sorting all > of the available .idx files before we produce even the first commit > is a horrible thing to do. > > But the problem with a super index is repacking. Every time the user > repacks their recent loose objects (or recently fetched packs) we are > folding some packfiles together, but may be leaving others alone. > The super index would need to account for the packfiles we aren't > looking at or repacking. It gets complicated fast. > > There's also the problem of alternate ODBs; do we fold the indexes > of our alternates into our own super index? Or does each ODB get > its own super index and we still have to load multiple super index > files? Yes, the problem is that even an on-demand, "lazy" merge is likely to require far more work than the expected number of index probes. > In pack v4 we're likely to move the SHA-1 table from the .idx file > into the front of the .pack file. This makes the .idx file hold > only the offsets and the CRC checkums of each object. If we start > making a super index, we have to duplicate the SHA-1 table twice > (once in the .pack, again in the super index). Hmm, hopefully the SHA-1 table can go at the _end_ since with split packs that's the only time we know the number of objects in the pack... ;-) Thanks, -- Dana L. How danahow@gmail.com +1 650 804 5991 cell