From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: Notes on Subproject Support Date: Sun, 22 Jan 2006 21:48:06 -0800 Message-ID: <7v64ob1omh.fsf@assigned-by-dhcp.cox.net> References: <7v3bjfafql.fsf@assigned-by-dhcp.cox.net> <7v7j8r7e7s.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org, Petr Baudis X-From: git-owner@vger.kernel.org Mon Jan 23 06:48:19 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1F0uYn-0003gl-Rw for gcvg-git@gmane.org; Mon, 23 Jan 2006 06:48:14 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964811AbWAWFsK (ORCPT ); Mon, 23 Jan 2006 00:48:10 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964812AbWAWFsK (ORCPT ); Mon, 23 Jan 2006 00:48:10 -0500 Received: from fed1rmmtao01.cox.net ([68.230.241.38]:8947 "EHLO fed1rmmtao01.cox.net") by vger.kernel.org with ESMTP id S964811AbWAWFsJ (ORCPT ); Mon, 23 Jan 2006 00:48:09 -0500 Received: from assigned-by-dhcp.cox.net ([68.4.9.127]) by fed1rmmtao01.cox.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id <20060123054708.NPZT15695.fed1rmmtao01.cox.net@assigned-by-dhcp.cox.net>; Mon, 23 Jan 2006 00:47:08 -0500 To: Daniel Barkalow In-Reply-To: <7v7j8r7e7s.fsf@assigned-by-dhcp.cox.net> (Junio C. Hamano's message of "Sun, 22 Jan 2006 20:36:23 -0800") User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Junio C Hamano writes: > In other words, I think the design so far does not require us to > touch tree objects at all, and I'd be happy if we do not have to. > > One reason I started the bound commit approach was exactly > because I only needed to muck with commit objects and did not > have to touch trees and blobs; after trying to implement the > core level for "gitlink", which I ended up touching quite a lot > and have abandoned for now. BTW, let's digress a bit. I think recording "commit" in the tree objects is in line with the logical organization described in README: "blob" and "tree" represent a state, and have *nothing* to do with how we came about to that state. The historyh is described in "commit" objects. The bound commit approach keeps that property. The "gitlink" approach, as I understand how Linus outlined in his original suggestion, is a bit different. The link objects appear in tree objects, and when you "git cat-file link" one of them, you would see something like this: commit 5b2bcc7b2d546c636f79490655b3347acc91d17f name kernel So in that sense, "gitlink" approach departs from the original premise of "commit" being the only thing that ties things together. Tree objects with "gitlink" know where they are in the history [*1*]. By this, I do not mean to say that "gitlink" approach is inferior because it breaks that original premise. I am just pointing it out as a difference between two approaches. Now, the current way index file is used is as a staging area to create a new commit on top of the tip of the current branch. However, it is interesting to note that logically, by itself *alone*, it cannot be used that way. The information the index file records is something that can be used to write out a tree object, and not a commit, because it does not know where the current state sits in the history. We have two separate files, $GIT_DIR/HEAD that records which branch we are on, and the branch head ref the HEAD points at, which records where the current index came from, for that purpose. The latter tells us what commit we should use as the parent commit if we create such a commit, and the former tells us which branch head to update once we create one. So in that sense, the index file is just a staging area to create a new tree, not a new commit. We could have done things differently. I am not advocating to do the following change, but offering a possibility as a thought experiment. It just felt interesting enough to point them out. The index file could have recorded what commit the current state recorded in the index came from. By recording the commit the index was read from in the index itself, independently from the $GIT_DIR/refs/heads/$branch file, we could have been able to allow fetching into the current branch. When the $branch file for the current branch was updated by a fast-forward fetch, we would notice that the commit recorded there no longer match what is recorded in the index. Another interesting consequence is if the development is a single repository and linear, we did not even need any file in $GIT_DIR/refs/ ("branchless git"). The commit recorded as the topmost in the index file itself would have served as the tip of the development, and we would have been able to tangle the history starting from the commit in the index file. While we are doing a thought experiment, let's say we allow to record more than one commits the current index is based upon. 'git merge' would record all the parent commits there, so that writing out the merge result out of the index file as a tree and then recording these commits as parents would have been the way to create a merge commit. We would not need the auxiliary file $GIT_DIR/MERGE_HEAD if we did so. In other words, if the index file recorded the commits its contents were based upon, instead of being a staging area for a new tree, it would have been a staging area for a new commit. Now, the latest proposal, borrowing your idea, records the subproject commits bound to subdirectories in the index itself. This is halfway to make the index file a staging area for the next commit. If we were to do that, we also *could* record the commits the current index is based upon, so that it can truly be used as a staging area to create a new commit, not just a tree. On the other hand, this could be a reason *not* to do the `update-index --bind` to record the subproject information in the index file. An auxiliary file such as $GIT_DIR/bind might be sufficient, just like $GIT_DIR/MERGE_HEAD has been good enough for us so far. One difference between MERGE_HEAD and bind is that the former is very transient -- only exists during a merge while the latter is persistent while the top commit is checked out and being worked on. [Footnote] *1* One good property of "gitlink" approach is that we *could* extend this blob-like object to store arbitrary human readable information to represent a point-in-time from an arbitrary foreign SCM system. IOW, we do not necessarily have to require `commit` line that name a git commit to be there. It could say "Please slurp http://www.kernel.org/pub/software/.../git.tar.gz and extract it in git/ directory". Of course, for such a toplevel project commit, the tool may not be able to do a checkout automatically and require the user to cat-file the link, download a tarball and extract the subtree there manually. The bound commit approach requires you to have git commit object names on the `bind` lines, and it is fundamentally much harder to extend it to allow interfacing with foreign (non-)SCM systems.