From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id DE74520281 for ; Fri, 29 Sep 2017 20:36:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752162AbdI2UgR (ORCPT ); Fri, 29 Sep 2017 16:36:17 -0400 Received: from mail-pf0-f171.google.com ([209.85.192.171]:43338 "EHLO mail-pf0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752003AbdI2UgQ (ORCPT ); Fri, 29 Sep 2017 16:36:16 -0400 Received: by mail-pf0-f171.google.com with SMTP id y29so367429pff.0 for ; Fri, 29 Sep 2017 13:36:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=k9HrwFBj5v6EkJW98b2joL9NPEA+Uy6/ncsJoetGfsk=; b=VtjE5y23yjxHxfhFgrNLTFuwiU4cur+4/yk1cKVZViEb4fPkLSlBJ5IBuaI0chcgZA 1JL2pLLape6vWzmwoGpd5aFglR0BQRowezBTdwquKU9Div9ytyiQs2CDuixSl4PI5b3Y F033a0Ce2PSrKlbJwb3nJ8htR9sB9pJK7/bpvxB/dK083x+V+Oy2cQgtP1Jwsc7b/CkV D62/DmSsaudwrSXSQMXdlOkQJdMMgQ72DNC4Cp5jZ74U0jwJP370fAIhjoDMgiThLNGa MyPp3aoNzEc8ZhvBNtAWIHqnp4zldNok03jfId87+M36PNOC6qJ0W4hASHlr/ouj5OOU 1oyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=k9HrwFBj5v6EkJW98b2joL9NPEA+Uy6/ncsJoetGfsk=; b=N2niHsfBXAgnDrLCHpiPwxt5QaasAnd0RIhgDYxmrBUiamhGJeq/qR/08ImGL1qPHl biX2MYFUjSfcDqyRryqsjxclSaDW2j8JxfFxW7RLvD0ruI0HwvdqrQOkalTk14QMA9Ux gMKpkTiXPdp42oH9VvfSXLSWnG+OIeQrjQCczrqq6LnaStWbvYpGRdSA8L6DgPnfqx5z J57dauq+yqsz30pUmoBg6cP7r0/iqEvopILLHhtP6iuXzf/lM60aIqaRGWNv9qaN03sY gtZtsWe6tOXrY3DIFImhsLXSNoH0njBCen+8GzRy17kEUrNQrtpsgPIwYFvfJmZFpd7s BIJA== X-Gm-Message-State: AHPjjUjf8r0y1rmdwdfkymQhzyP5ls80mJYd6OsqM4+0kHApFP35v3wV aIV06nw4+Ys0M7gVyPXw+OOLwA== X-Google-Smtp-Source: AOwi7QAtIfXSj3r5Q5fnqaIuwBbmstd0XxM/p51easbhG7Ehayfw+5oCQ0wu79nxYtJrQVtOQtyrGg== X-Received: by 10.159.252.201 with SMTP id o9mr8382655pls.135.1506717375815; Fri, 29 Sep 2017 13:36:15 -0700 (PDT) Received: from twelve3.mtv.corp.google.com ([2620:0:100e:422:95f1:13e:dc22:c4e5]) by smtp.gmail.com with ESMTPSA id d69sm8261246pfl.50.2017.09.29.13.36.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 29 Sep 2017 13:36:14 -0700 (PDT) Date: Fri, 29 Sep 2017 13:36:14 -0700 From: Jonathan Tan To: Christian Couder Cc: git , Junio C Hamano , Jeff King , Ben Peart , Nguyen Thai Ngoc Duy , Mike Hommey , Lars Schneider , Eric Wong , Christian Couder Subject: Re: [PATCH v6 09/40] Add initial external odb support Message-Id: <20170929133614.1c0cd68ad80139fafdf68b86@google.com> In-Reply-To: References: <20170916080731.13925-1-chriscool@tuxfamily.org> <20170916080731.13925-10-chriscool@tuxfamily.org> <20170919104553.494d7d3b@twelve2.svl.corp.google.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, 27 Sep 2017 18:46:30 +0200 Christian Couder wrote: > I am ok to split the patch series, but I am not sure that 01/40 to > 09/40 is the right range for the first patch series. > I would say that 01/40 to 07/40 is better as it can be seen as a > separate refactoring. I mentioned 09/40 because this is (as far as I can tell) the first one that introduces a new design. > I don't think single-shot processes would be a huge burden, because > the code is simpler, and because for example for filters we already > have single shot and long-running processes and no one complains about > that. It's code that is useful as it makes it much easier for people > to do some things (see the clone bundle example). > > In fact in Git development we usually start to by first implementing > simpler single-shot solutions, before thinking, when the need arise, > to make it faster. So a perhaps an equally valid opinion could be to > first only submit the patches for the single-shot protocol and later > submit the rest of the series when we start getting feedback about how > external odbs are used. My concern is that, as far as I understand about the Microsoft use case, we already know that we need the faster solution, so the need has already arisen. > And yeah I could change the order of the patch series to implement the > long-running processes first and the single-shot process last, so that > it could be possible to first get feedback about the long-running > processes, before we decide to merge or not the single-shot stuff, but > I don't think it would look like the most logical order. My thinking was that we would just implement the long-running process and not implement the single-shot process at all (besides maybe a script in contrib/). If we are going to do both anyway, I agree that we should do the single-shot process first. > > And another possible issue is that we design ourselves into a corner. > > Thinking about the use cases that I know about (the Android use case and > > the Microsoft GVFS use case), I don't think we are doing that - for > > Android, this means that large blob metadata needs to be part of the > > design (and this patch series does provide for that), and for Microsoft > > GVFS, "get" is relatively cheap, so a configuration option to not invoke > > "have" first when loading a missing object might be sufficient. > > If the helper does not advertise the "have" capability, the "have" > instruction will not be sent to the helper, so the current design is > already working for that case. Ah, that's good to know. > > And I think that my design can be extended to support a use case in > > which, for example, blobs corresponding to a certain type of filename > > (defined by a glob like in gitattributes) can be excluded during > > fetch/clone, much like --blob-max-bytes, and they can be fetched either > > through the built-in mechanism or through a custom hook. > > Sure, we could probably rebuild something equivalent to what I did on > top of your design. > My opinion though is that if we want to eventually get to the same > goal, it is better to first merge something that get us very close to > the end goal and then add some improvements on top of it. I agree - I mentioned that because I personally prefer to review smaller patch sets at a time, and my patch set already includes a lot of the same infrastructure needed by yours - for example, the places in the code to dynamically fetch objects, exclusion of objects when fetching or cloning, configuring the cloned repo when cloning, fsck, and gc. > > - I get compile errors when I "git am" these onto master. I think > > '#include "config.h"' is needed in some places. > > It's strange because I get no compile errors even after a "make clean" > from my branch. > Could you show the actual errors? I don't have the error messages with me now, but it was something about a function being implicitly declared. You will probably get these errors if you sync past commit e67a57f ("config: create config.h", 2017-06-15). > > Any reason why you prefer to update the loose object functions than to > > update the generic one (sha1_object_info_extended)? My concern with just > > updating the loose object functions was that a caller might have > > obtained the path by iterating through the loose object dirs, and in > > that case we shouldn't query the external ODB for anything. > > You are thinking about fsck or gc? > Otherwise I don't think it would be clean to iterate through loose object dirs. Yes, fsck and gc (well, prune, I think) do that. I agree that Git typically doesn't do that (except for exceptional cases like fsck and gc), but I was thinking about supporting existing code that does that iteration, not introducing new code that does that.