From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Lehmann Subject: Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail Date: Mon, 16 Nov 2015 22:42:09 +0100 Message-ID: <564A4DB1.4070507@web.de> References: <1447321061-74381-1-git-send-email-larsxschneider@gmail.com> <20151113053547.GD29708@sigill.intra.peff.net> <20151113233807.GD16173@sigill.intra.peff.net> <20151113234116.GA18234@sigill.intra.peff.net> <564A279C.6000802@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Jeff King , Lars Schneider , "git@vger.kernel.org" , Junio C Hamano , Duy Nguyen To: Stefan Beller X-From: git-owner@vger.kernel.org Mon Nov 16 22:42:28 2015 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZyRXe-0004NO-PL for gcvg-git-2@plane.gmane.org; Mon, 16 Nov 2015 22:42:27 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752181AbbKPVmW (ORCPT ); Mon, 16 Nov 2015 16:42:22 -0500 Received: from mout.web.de ([212.227.15.3]:62298 "EHLO mout.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751914AbbKPVmV (ORCPT ); Mon, 16 Nov 2015 16:42:21 -0500 Received: from [192.168.178.41] ([79.211.113.40]) by smtp.web.de (mrweb002) with ESMTPSA (Nemesis) id 0MDPUT-1aAr3m0OvF-00Gq79; Mon, 16 Nov 2015 22:42:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 In-Reply-To: X-Provags-ID: V03:K0:CXLki7K77bHJjmp1aoNUs0QDwxMplREu5gKP9+jHoqHoca+uyNK PsY0tQ4sZ6fJYmV3qXqPpnIASz/I2UtSkw6IrLxu0isseL1R1F2LmwZKt3sSBoM6a8MSa2w s4oFnxn0t+tEYQomvFjYbRAjkehPUkXtjdI5ukR85c7WvJsdAvkgclPh7YHuQddF24Zt9r/ ALT1UDBLE7ZJQeCqZN4CQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:0WXeFkZbYYI=:fAlV8VmFaPi6SitUiaeRUL E6dz4+KJcRBMuhujfjYv341Wx2rJ3aIg9WhBincKyS1N8WS5+2JvYqFSlEADub5rjA3gJqFDH +Ty+9QwNwp+E2oyt+sJR/nQoJKfzmutWv6y0nCK/ntJSYJgyHTvUv7Fz8NI5jYuNTmNR4NXc5 nx+/w3q36stqbOsYBY9H6aGoP2TBVSZExEe6bTduNZdWPTniQ6KFprxFzBQYsd8tCiDv0vlrG OtlY0Q4kBzee2Kq3H44x43hDEKVCN5MPAz1YRUsBnjM7jAu9G64B6LdwU6H2GNGa1IKq6GK5A 1WPSc3z32tfHmoTLbJae2aYNNrNRbcz+ExbL6SJBnnF+sk9zWpEYm4AAQO6zi4ttldKDmlUbN vzWIq6MAKGB/KJJvOs541X01aL1mApVzSrhnvmTmis3Ln1C4AQDdaQXwuHyduCAJ2PuMuzeLW V7V4Qp2RTsERk04kCbRNfai8ynYonDDZCCJPED8KSIsK4yJ0b0gsjFyi+cFqg+p9RQXt2w8j6 PzuV34la6R9hy9ylnXaUOvU4vqRjbiAyARlkNMqBMrY/FSyWP3+NMoxsF6VgtKUmpy5mAQJs4 8z1Vk8YN7R0jmBUKOEt3sNdv86e6f4SkQ9KrQJuLzsyPAOktaxnps2YizDGgBoWPPO5WRowey JnTvoaGj2FerXOMfabXgBg2qo6HNLsHpkFFjwseMa1y5DmnwRGBOfgW5zwqMPln6EZdgIMVCl CtqeR0fU8B0kFkDngrYWAVd7WRzNTweawH1hJpiBAm1TdRrsGUg5vUHBVKXLdYGn8bPrxerk Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Am 16.11.2015 um 20:25 schrieb Stefan Beller: > On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann wrote: >> Am 14.11.2015 um 01:10 schrieb Stefan Beller: >>> Thanks for pointing out that we already have some kind of server support. >>> >>> I wonder if we should add an additional way to make fetching only some >>> sha1s possible. ("I don't want users to fetch any sha1, but only those >>> where superprojects point{ed} to", even if you force push a superproject, >>> you want to want to only allow fetching all sha1s which exist in the >>> current >>> superprojects branch.) >> >> >> Me thinks the restrictions for sha1-fetching could come from the branches >> these sha1s are found in the upstream submodule: if the client is allowed >> to fetch a branch, it should be able to fetch any sha1 on that branch. > > I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant > set, is not sufficient though. > > To fetch an arbitrary sha1, you would need to check if that sha1 is part > of the history of any advertised branch and then allow fetching serverside, > which sounds like some work for the server, which we may want to avoid > by having smarter data structures there. > > Instead of having to search all branches for the requested sha1, we could have > some sort of data structure to make it not an O(n) operation (n being > all objects > in the repo). > > Maybe I overestimate the work which needs to be done, because the server has > bitmaps nowadays. > > Maybe a lazy reverse-pointer graph can be established on the serverside. > So I guess when we add the feature to fetch arbitrary sha1s, reachable from > any branch, people using submodules will make use of the feature. (such as with > git fetch --recurse --depth 1 or via a new `git fetch --recursive > --up-to-submodule-tip-only`) > > So once the server is asked for a certain sha1, it will do the > reachability check, > which takes some effort, but then stores the result in the form: > "If ${current tip sha} of ${branch} is reachable, so is requested $sha1." > > So when the next fetch request for $sha1 arrives, the server only needs to > check for ${current tip sha} to be part of $branch, which is expected to be > a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could > just tell you or at least shorten the walk even more) > If the ${branch} has changed, the next evaluation for $sha1 can update > the cache, > such that the reverse lookup is not expensive on expectation. Makes sense, although I do not know enough about the server side to tell if it would need such an optimization or will cope with the load just fine. But even if we'd enable such a feature without having to set an extra config option, a submodule fetch asking for certain sha1s would have to fall back to a simple "fetch all" like we do now when the server doesn't support that for backwards compatibility. But maybe that's just obvious. > I assume this will mostly be used with submodules, so only a few sha1s need > this caching. I won't bet on that, some of the submodules at $DAYJOB are rather busy and see almost the same traffic as their superprojects ;-) >>> Maybe our emails crossed, but in the other mail I pointed out we could use >>> some sort of hidden ref (refs/superprojects/*) for that, which are >>> allowed to mark >>> any sort of sha1, which are allowed in the superproject/submodule context >>> to be fetched. >>> >>> So whenever you push to a superproject (a project that has a gitlink), >>> we would need to check serverside if that submodule is at us and mark the >>> correct sha1s in the submodule. Then you can disallow fetching most of the >>> sha1s >>> but still could have a correctly working submodule update mechanism. >> >> >> And what happens if the submodule isn't at us? Involving the serverside of >> a superproject in submodule fetching sounds wrong to me. Me thinks that >> the upstream of the submodule should always control if a sha1 is allowed >> to be fetched. Or did I understand you wrong? > > Yes and no. > The serverside submodule repository should be responsible for the ultimate > decision if you are allowed to fetch that sha1. But maybe on pushing the > superproject, we can store a hint in the submodule, that this sha1 is legit. > Although I may be missguided in my thinking here as the superproject > should have no influence on the submodule. Submodules should never be aware of their superproject. But a superproject does know its submodules, so I don't think the influence you describe here is a problem per se. It's just looking like a corner case to me, as in a lot of scenarios submodules do not live on the same server. And even if they do, a superproject has no canonical way of finding their submodule's repos (except for submodules that use relative URLs). So I'd rather like to see a generic solution first, before we think about adding an optimized version for certain setups later ;-) The only real itch I have with the "superproject declaring submodule sha1s fetchable on the server" approach is that it smells like a security problem. The access rights of superprojects are often different from those of the submodules it contains and this feels like a privilege escalation waiting to happen.