From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shawn Pearce Subject: Re: Packfile can't be mapped Date: Mon, 28 Aug 2006 12:42:22 -0400 Message-ID: <20060828164222.GA22451@spearce.org> References: <9e4733910608271804j762960a8ud83654c78ebe009a@mail.gmail.com> <20060828024720.GD24204@spearce.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org, Jon Smirl X-From: git-owner@vger.kernel.org Mon Aug 28 18:42:50 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GHkCC-0005QS-PD for gcvg-git@gmane.org; Mon, 28 Aug 2006 18:42:45 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750863AbWH1Qmk (ORCPT ); Mon, 28 Aug 2006 12:42:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751193AbWH1Qmk (ORCPT ); Mon, 28 Aug 2006 12:42:40 -0400 Received: from corvette.plexpod.net ([64.38.20.226]:49644 "EHLO corvette.plexpod.net") by vger.kernel.org with ESMTP id S1750863AbWH1Qmj (ORCPT ); Mon, 28 Aug 2006 12:42:39 -0400 Received: from cpe-74-70-48-173.nycap.res.rr.com ([74.70.48.173] helo=asimov.home.spearce.org) by corvette.plexpod.net with esmtpa (Exim 4.52) id 1GHkBr-0004nR-Ne; Mon, 28 Aug 2006 12:42:23 -0400 Received: by asimov.home.spearce.org (Postfix, from userid 1000) id 9B23D20FB7F; Mon, 28 Aug 2006 12:42:22 -0400 (EDT) To: Nicolas Pitre Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.11 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - corvette.plexpod.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - spearce.org X-Source: X-Source-Args: X-Source-Dir: Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Nicolas Pitre wrote: > On Sun, 27 Aug 2006, Shawn Pearce wrote: > > > I'm going to try to get tree deltas written to the pack sometime this > > week. That should compact this intermediate pack down to something > > that git-pack-objects would be able to successfully mmap into a > > 32 bit address space. A complete repack with no delta reuse will > > hopefully generate a pack closer to 400 MB in size. But I know > > Jon would like to get that pack even smaller. :) > > One thing to consider in your code (if you didn't implement that > already) is to _not_ attempt any delta on any object whose size is > smaller than 50 bytes, and then limit the maximum delta size to > object_size/2 - 20 (use that for the last argument to diff-delta() and > store the undeltified object when diff-delta returns NULL). This way > you'll avoid creating delta objects that are most likely to end up being > _larger_ than the undeltified object. So I added Nico's suggestions to fast-import and ran it on a small subset of the Mozilla repository (3424 blobs): naive always delta: 6652 KiB Nico's suggestion: 6842 KiB So Nico's suggestion of limiting delta size to (orig_len/2)-20 or not using deltas on blobs < 50 bytes actually added 190 KB to the output pack. Since this sample is probably fairly representative of the rest of the repository's blobs I'm thinking we may see a 2.8% increase in size over the current 930 MB blob pack. That's another 26 MB in our intermediate pack. I don't think this suggestion is really worth including in fast-import right now... -- Shawn.