From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Shawn O. Pearce" Subject: Re: [PATCH WIP 0/4] Special code path for large blobs Date: Tue, 2 Jun 2009 07:45:55 -0700 Message-ID: <20090602144555.GH30527@spearce.org> References: <1243488550-15357-1-git-send-email-pclouds@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Nicolas Pitre , git@vger.kernel.org To: Nguyen Thai Ngoc Duy X-From: git-owner@vger.kernel.org Tue Jun 02 16:46:04 2009 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1MBVFb-0000zF-Ah for gcvg-git-2@gmane.org; Tue, 02 Jun 2009 16:46:03 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754392AbZFBOpz (ORCPT ); Tue, 2 Jun 2009 10:45:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753507AbZFBOpy (ORCPT ); Tue, 2 Jun 2009 10:45:54 -0400 Received: from george.spearce.org ([209.20.77.23]:33807 "EHLO george.spearce.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752985AbZFBOpx (ORCPT ); Tue, 2 Jun 2009 10:45:53 -0400 Received: by george.spearce.org (Postfix, from userid 1001) id E3805381D1; Tue, 2 Jun 2009 14:45:55 +0000 (UTC) Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Nguyen Thai Ngoc Duy wrote: > 2009/5/29 Nicolas Pitre : > > However, like I said previously, I'd encapsulate large blobs in a pack > > right away instead of storing them as loose objects. ??The reason is that > > you can effortlessly repack/fetch/push them afterwards by simply > > triggering the pack data reuse code path for them. ??Extracting large and > > undeltified blobs from a pack is just as easy as from a loose object. > > Makes sense. And the code looks nice too. > > > To accomplish that, you only need to copy write_pack_file() from > > builtin-pack-objects.c and strip it to the bone with only one object to > > write. > > write_pack_file() is too scary to me, I ripped from fast-import.c > instead. BTW, how does git handle hundreds of single object packs? I > don't know if prepare_packed_git scales in such cases. Yea, its not going to do that great. We may be able to improve that code path by sorting any pack whose index is really small and pack file is really big to the end of the list, where its least likely to be matched, so we don't even bother to load the index into memory during normal commit traversal. But even with that sorting, its still going to suck. Lookup for a large binary is O(N), where N is the number of large binary *revisions*. Yuck. Really, objects in the 200MB+ range probably should just be in a lone file named by its SHA-1... aka, a loose object. Combining them into a pack is going to be potentially expensive disk IO wise, and may not gain you very much (its over 200 MB compressed with deflate, its likely already compressed binary content that may not delta well). Way back we had that pack-style loose object format, for exactly these sorts of files, and exactly to avoid having many packs of just 1 object, but that didn't go anywhere... indeed, Nico deleted the code that creates that format. -- Shawn.