From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: RFC: Subprojects Date: Sat, 14 Jan 2006 11:16:34 -0800 (PST) Message-ID: References: <43C52B1F.8020706@hogyros.de> <43C537C9.4090206@hogyros.de> <7vacdzkww3.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Johannes Schindelin , git@vger.kernel.org, Simon Richter X-From: git-owner@vger.kernel.org Sat Jan 14 20:17:18 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Exqtl-00069C-E4 for gcvg-git@gmane.org; Sat, 14 Jan 2006 20:17:13 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750755AbWANTRK (ORCPT ); Sat, 14 Jan 2006 14:17:10 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750818AbWANTRK (ORCPT ); Sat, 14 Jan 2006 14:17:10 -0500 Received: from smtp.osdl.org ([65.172.181.4]:19926 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S1750755AbWANTRI (ORCPT ); Sat, 14 Jan 2006 14:17:08 -0500 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id k0EJGaDZ012384 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sat, 14 Jan 2006 11:16:37 -0800 Received: from localhost (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id k0EJGYKO011825; Sat, 14 Jan 2006 11:16:34 -0800 To: Junio C Hamano In-Reply-To: <7vacdzkww3.fsf@assigned-by-dhcp.cox.net> X-Spam-Status: No, hits=0 required=5 tests= X-Spam-Checker-Version: SpamAssassin 2.63-osdl_revision__1.65__ X-MIMEDefang-Filter: osdl$Revision: 1.129 $ X-Scanned-By: MIMEDefang 2.36 Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: On Sat, 14 Jan 2006, Junio C Hamano wrote: > + The contained project is kept totally independent and does > not have to know it is contained. > > + The tree for the contained project can be rooted anywhere in > the containing project's tree. Right. > - The contained project cannot be rooted at the same level or > higher than the containing project; the containing project > can only delegate a whole subdirectory to the contained > project. Yes. However, I think this is actually a _huge_ advantage. The thing is, if you do the contained projects as "union projects" as you suggest, I will bet that it will really really suck, because it ends up losing the two positives above. In particular, any real independent project will have it's own "Makefile" or "configure-in", and often its own "src" subdirectory or other pseudo-standard names. And the "contained project as a link" approach has zero problems with that at all, exactly because it keeps the projects clearly separate - just linked (one way). > What does "git-diff-index/git-diff-tree/git-diff-files" would do > with them? I would actually argue that git itself wouldn't do a whole lot with them. There are real advantages to seeing only the diffs wrt _one_ of the projects, and I'd argue that git-diff-* would actually act like they now act for directories that they don't recurse into, ie you'd see something like :160000 160000 5eb57670... 3f1a42aa... M sub-project and it would be up to higher-level porcelain to recurse. Why? Partly because that's actually likely enough for a lot of users: you _can_ use just the raw git programs by just doing cd sub-project git diff .. git commit and so technically you aren't really missing a lot. The capabilities are there, you just have to do some more by hand (but in many ways that is _good_: it makes it obvious that you're really committing a _different_ subproject). The other reason? A lot of the git infrastructure really does only work on the "one project" level. The programs work with _one_ index, not two. Reading two trees is perfectly possible, but unless you keep them in separate stages, you can't separate them afterwards. IOW, trying to be recursive really does end up being a big change, for very little gain (and for a lot of potential bugs and instability). In contrast, doing it at a higher level means that you have a simple and reliable lower level that you can trust. Layering is good. > Fetching/cloning at the core level is easy. "git-fetch-pack" > would just need to do one level, but Porcelains need to address > how to actually arrange the subprojects cloning to happen, which > is harder. > > "git clone" would say: "Ah, now I see these gitlinks; we need to > clone them. Actually, I would say no - that's actually not a "clone" operation so much as a "checkout" operation. There are strong arguments that you should _not_ clone sub-projects when you clone the top-level project: there's no reason to. Anybody else who clones it will have all the information you have, so cloning th esub-project is just extra work. So only if you actually check it out (which is often in practice the second stage of the cloning, of course) do you want to fetch the subproject too. But even then you might want to ask the user (he may have a local repository for that sub-project somewhere else, so going to the "canonical name" might be the wrong thing to do - and he might not even care, because he might want to work _just_ on the top-level project). > Now I'll think aloud about a completely different design. > > We could simply overlay the projects. I think this is what > Johannes suggested earlier. > > You keep one branch for each "subproject", and make commits into > each branch (i.e. if you modified files for the upstream kernel, > the change is committed to the branch for linux-2.6 subproject), > but when checking things out, you do an equivalent of octopus > merge across subprojects. I think this one has serious disadvantages: - it's much less obvious when there are common names and especially common subdirectories. - in _practice_, almost all sub-projects are kept in sub-directories. Are you doing to change the sub-project git tree? How are you going to merge back to the original sub-project? - iow, I think this only works for sub-projects that are totally controlled by the top-level project - in which case they might as well just be totally merged into the top level (the way we did with the "tools" project, and largely with "gitk"). in the "gitk" case, we could actually continue to keep gitk a separate project, but that was really fortunate: it's purely because gitk ends up being a single file, with no Makefile at all to build it independently etc. The moment we integrated the "tools" sub-project into git, we lost the ability to do that, exactly because they now needed to share Makefiles etc, making all further development very inter-twined. Put another way: the moment you have linkages going both ways between the subproject and the top-level project, it's no longer two separate projects. At that point, it in practice becomes one, since the sub-project can no longer do independent development without merging becoming a big issue. The advantage of having a "git link" is exactly the fact that the dependency goes only one way. The subproject remains truly independent. Linus