From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: RFC: Subprojects Date: Tue, 17 Jan 2006 17:41:48 -0800 Message-ID: <7vpsmq2tyb.fsf@assigned-by-dhcp.cox.net> References: <43C52B1F.8020706@hogyros.de> <7vek3ah8f9.fsf@assigned-by-dhcp.cox.net> <200601161144.48245.Josef.Weidendorfer@gmx.de> <7vek37rj83.fsf@assigned-by-dhcp.cox.net> <7vfynnfkc8.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Josef Weidendorfer , git@vger.kernel.org X-From: git-owner@vger.kernel.org Wed Jan 18 02:42:18 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Ez2Kl-0006UC-Sv for gcvg-git@gmane.org; Wed, 18 Jan 2006 02:42:03 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751285AbWARBlv (ORCPT ); Tue, 17 Jan 2006 20:41:51 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751288AbWARBlv (ORCPT ); Tue, 17 Jan 2006 20:41:51 -0500 Received: from fed1rmmtao02.cox.net ([68.230.241.37]:51403 "EHLO fed1rmmtao02.cox.net") by vger.kernel.org with ESMTP id S1751285AbWARBlu (ORCPT ); Tue, 17 Jan 2006 20:41:50 -0500 Received: from assigned-by-dhcp.cox.net ([68.4.9.127]) by fed1rmmtao02.cox.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id <20060118013952.WNGY17006.fed1rmmtao02.cox.net@assigned-by-dhcp.cox.net>; Tue, 17 Jan 2006 20:39:52 -0500 To: Daniel Barkalow User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Daniel Barkalow writes: > Incidentally, I don't think we'd want "gitlink" objects with the "gitlink" > approach; we'd want trees to contain commit objects for subprojects. The > "gitlink" thing that corresponds to ".git/HEAD" isn't an object, it's a > tree entry, which, like ".git/HEAD" (or, more appropriately, > ".git/refs/heads/something") maps a name to the hash of a commit object. > Hmm... maybe libification should go ahead of subprojects. If access to the > index weren't so often open-coded, it would just be a matter of having > these entries in the data structure, but not actually returned by any > current call, and it would be just like they were in some other structure. And libification has been waiting for the core to settle ;-) We have to start somewhere. > Actually, it should be easy to have them in the index file but not in the > main index data structure, by skipping over them in the for loop near the > end of read_cache().... Yeah, I guess I was vaguely thinking along those lines while I was driving to work this morning. I appreciate your spelling it out to make things clearer. > Side issue here: this implies that the kernel objects are in the > superproject's repository, or at least accessible from it. So prune has to > not remove them. So, if you've committed changes to a subproject but not > yet committed the fact that you want to use the changed subproject into > the superproject, fsck-objects has to find them somewhere. Yes. I was planning to have "$GIT_DIR/bind" that says: master kernel=linux-2.6/ gcc=gcc-4.0/ meaning: The project kept track by "master" branch binds the project kept track by "kernel" branch as its subproject at its linux-2.6/ subdirectory. or something like that, so when you make a commit, you update those other branches as needed. You already raised that issue at the end of your message, and I will explain how I think that can/should be done as a response to that part later. >> Reading such a commit is easy: >> >> $ git-read-tree $tree ;# ;-) >> >> But that is cheating. > > This is for backwards compatibility, I assume? This is done more for not having to touch *anything* that does "index vs working file", "tree vs index" and "tree vs working file via index". It also is the easiest way to keep the "a commit object name can be used in place of the tree object name of the tree it contains" invariant. Also I suspect this organization might help recursive subprojects, but if it is the case, that is just a byproduct, not a design goal. >> When you have such an index, writing out various trees are: >> >> $ git-write-tree ;# $tree >> $ git-write-tree --prefix=linux-2.6/ ;# $linuxsub^{tree} >> $ git-write-tree --prefix=gcc-4.0/ ;# $gccsub^{tree} >> $ git-write-tree \ >> --bound=linux-2.6/ --bound=gcc-4.0/ ;# $primarysub^{tree} > > The hard thing here is getting the commits for the trees. The bind lines > need commits, which means either identifying that we already have the > correct commit object, because we didn't change anything in the > subproject, or generating a new commit object with some message and the > right parent. And we want to use commit objects, not tree objects, in the > bind lines, so that, once we track a problem to the change of which commit > is bound, we can treat the subproject as a project and debug it with > bisect, rather than just having one tree that works and one that doesn't. Your wording "get the commit" is a bit misleading. Even when the tree for a subproject happens to match a commit in the subproject in a distant past, we would not want to use it unless the user explicitly asked for it. IOW, we do not actively go and look for a commit. Our subproject tree either matches the subproject branch head, in which case we just reuse it, or we make a new commit on top of that ourselves. Let's say my project breaks with the latest kernel, and I suspect that it would work with v2.6.13 sources. To test that theory, I could: $ git branch -f kernel v2.6.13 ;# rewind $ git ls-files linux-2.6/ | xargs git update-index --force-remove $ git read-tree --prefix=linux-2.6/ -u kernel to construct such a tree. Maybe the latter two-command sequence "ls-files & read-tree --prefix" sequence deserves to become a command, "git update-subproject kernel" [*1*]. The result may work as-is, or I may need to do some further futzing in linux-2.6/ directory before the result works. Once the result starts working, I'd want to make a commit: - I compare the result of write-tree for linux-2.6/ portion and the tree object name contained in the head commit of the "kernel" branch. If they match, then the current "kernel" branch head commit is what I'll place on the "bind" line in my commit; I do not have to make a new commit in the "kernel" subproject in this case. - If the tree object does not match the "kernel" head, that means I have tweaked the kernel part further, on top of v2.6.13. So I make a commit for the kernel subproject (whose parent is obviously v2.6.13), update the kernel branch head with that commit, and then record that tip-of-the-tree commit for the subproject on the "bind" line in my commit for the toplevel. Or let's say my project builds with the latest kernel (IOW, I did not do the branch -f kernel in the above), and I made some custom tweaks in the kernel area. The above precedure would result in a new commit on top of the latest kernel, update the "kernel" branch head, and make a commit for the toplevel that records the updated "kernel" branch head on its "bind" line. Note that the above procedure did not use the commit object name recorded on the "bind" line at all in either case. From the mechanism point of view, it is the right thing to do. From the usability point of view, however, we may want to take notice that "bind" line commit and the bound branch head do not match, and remind/warn the user about it. If the reason why they are different is because the user rewound the bound branch to use a known working version, or made fixes in the subproject and pulled the result into the bound branch (in which case there is no funny rewinding involved), then this warning is extraneous. But in the normal case of keep reusing the same vintage of subprojects (and maybe making necessary adjustments to subprojects while working on the main project), the commit object on the "bind" line of the HEAD commit and bound branch head should match. [Footnote] *1* One could also do a forward development on the kernel branch in a separate working tree and fetch from there. For example, if our example "superproject" is in embed/ directory, and there is a linux/ directory next to it to house a kernel repository, we could: $ cd ../linux/ $ edit && compile && test $ git commit -m 'Fix for upstream, not just for embed' to make an upstream fix, and then: $ cd ../embed/ $ git fetch ../linux/ master:kernel to update the "kernel" subproject branch head. In such a case: $ git update-subproject kernel would bring the subproject working tree and index up to date with respect to the updated kernel branch.