Notes on Subproject Support

All of lore.kernel.org
 help / color / mirror / Atom feed

* Notes on Subproject Support
@ 2006-01-23  1:35 Junio C Hamano
  2006-01-23  3:50 ` Daniel Barkalow
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-23  1:35 UTC (permalink / raw)
  To: git; +Cc: Daniel Barkalow, Petr Baudis

This is still a draft/WIP, but "release early" is a good
discipline, so...

-- >8 --

Notes on Subproject Support
===========================
Junio C Hamano <junkio@cox.net>
v1.0 January 22, 2006

Scenario
--------

The examples in the following discussion show how this proposal
plans to help this:

. A project to build an embedded Linux appliance "gadget" is
  maintained with git.

. The project uses linux-2.6 kernel as its subcomponent.  It
  starts from a particular version of the mainline kernel, but
  adds its own code and build infrastructure to fit the
  appliance's needs.

. The working tree of the project is laid out this way:
+
------------
 Makefile       - Builds the whole thing.
 linux-2.6/     - The kernel, perhaps modified for the project.
 appliance/     - Applications that run on the appliance, and
                  other bits.
------------

. The project is willing to maintain its own changes out of tree
  of the Linux kernel project, but would want to be able to feed
  the changes upstream, and incorporate upstream changes to its
  own tree, taking advantage of the fact that both itself and
  the Linux kernel project are version controlled with git.

The idea here is to:

. Keep `linux-2.6/` part as an independent project.  The work by
  the project on the kernel part can be naturally exchanged with
  the other kernel developers this way.  Specifically, a tree
  object contained in commit objects belonging to this project
  does *not* have linux-2.6/ directory at the top.

. Keep the `appliance/` part as another independent project.
  Applications are supposed to be more or less independent from
  the kernel version, but some other bits might be tied to a
  specific kernel version.  Again, a tree object contained in
  commit objects belonging to this project does *not* have
  appliance/ directory at the top.

. Have another project that combines the whole thing together,
  so that the project can keep track of which versions of the
  parts are built together.

We will call the project that binds things together the
'toplevel project'.  Other projects that hold `linux-2.6/` part
and `appliance/` part are called 'subprojects'.

Notice that `Makefile` at the top is part of the toplevel
project in this example, but it is not necessary.  We could
instead have the appliance subproject include this file.  In
such a setup, the appliance subproject would have had `Makefile`
and `appliance/` directory at the toplevel.

Setting up
----------

Let's say we have been working on the appliance software,
independently version controlled with git.  Also the kernel part
has been version controlled separately, like this:
------------
$ ls -dF current/*/.git current/*
current/Makefile    current/appliance/.git/  current/linux-2.6/.git/
current/appliance/  current/linux-2.6/
------------

Now we would want to get a combined project.  First we would
clone from these repositories (which is not strictly needed --
we could use `$GIT_ALTERNATE_OBJECT_DIRECTORIES` instead):

------------
$ mkdir combined && cd combined
$ cp ../current/Makefile .
$ git init-db
$ mkdir -p .git/refs/subs/{kernel,gadget}/{heads,tags}
$ git clone-pack ../current/linux-2.6/ master | read kernel_commit junk
$ git clone-pack ../current/appliance/ master | read gadget_commit junk
------------

We will introduce a new command to set up a combined project:

------------
$ git bind-projects \
	$kernel_commit linux-2.6/ \
	$gadget_commit appliance/
------------

This would do an equivalent of:

------------
$ git read-tree --prefix=linux-2.6/ $kernel_commit
$ git read-tree --prefix=appliance/ $gadget_commit
------------
[NOTE]
============
Earlier outlines sent to the git mailing list talked
about `$GIT_DIR/bind` to record what subproject are bound to
which subtree in the curent working tree and index.  This
proposal instead records that information in the index file
when `--prefix=linux-2.6/` is given to `read-tree`.

Also note that in this round of proposal, there is no separate
branches that keep track of heads of subprojects.
============

Let's not forget to add the `Makefile`, and check the whole
thing out from the index file.
------------
$ git add Makefile
$ git checkout-index -f -u -q -a
------------

Now our directory should be identical with the `current`
directory.  After making sure of that, we should be able to
commit the whole thing:

------------
$ diff -x .git -r ../current ../combined
$ git commit -m 'Initial toplevel project commit'
------------

Which should create a new commit object that records what is in
the index file as its tree, with `bind` lines to record which
subproject commit objects are bound at what subdirectory, and
updates the `$GIT_DIR/refs/heads/master`.  Such a commit object
might look like this:
------------
tree 04803b09c300c8325258ccf2744115acc4c57067
bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/
bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/
author Junio C Hamano <junio@kernel.org> 1137965565 -0800
committer Junio C Hamano <junio@kernel.org> 1137965565 -0800

Initial toplevel project commit
------------

Making further commits
----------------------

The easiest case is when you updated the Makefile without
changing anything in the subprojects.  In such a case, we just
need to create a new commmit object that records the new tree
with the current `HEAD` as its parent, and with the same set of
`bind` lines.

When we have changes to the subproject part, we would make a
separate commit to the subproject part and then record the whole
thing by making a commit to the toplevel project.  The user
interaction might go this way:
------------
$ git commit
error: you have changes to the subproject bound at linux-2.6/.
$ git commit --subproject linux-2.6/
$ git commit
------------

With the new `--subproject` option, the directory structure
rooted at `linux-2.6/` part is written out as a tree, and a new
commit object that records that tree object with the commit
bound to that portion of the tree (`5b2bcc7b` in the above
example) as its parent is created.  Then the final `git commit`
would record the whole tree with updated `bind` line for the
`linux-2.6/` part.

Checking out
------------

After cloning such a toplevel project, `git clone` without `-n`
option would check out the working tree.  This is done by
reading the tree object recorded in the commit object (which
records the whole thing), and adding the information from the
"bind" line to the index file.

------------
$ cd ..
$ git clone -n combined cloned ;# clone the one we created earlier
$ cd cloned
$ git checkout
------------

This round of proposal does not maintain separate branch heads
for subprojects.  The bound commits and their subdirectories
are recorded in the index file from the commit object, so there
is no need to do anything other than updating the index and the
working tree.

Switching branches
------------------

Along with the traditional two-way merge by `read-tree -m -u`,
we would need to look at:

. `bind` lines in the current `HEAD` commit.

. `bind` lines in the commit we are switching to.

. subproject binding information in the index file.

to make sure we do sensible things.

Just like until very recently we did not allow switching
branches when two-way merge would lose local changes, we can
start by refusing to switch branches when the subprojects bound
in the index do not match what is recorded in the `HEAD` commit.

Because in this round of the proposal we do not use the
`$GIT_DIR/bind` file nor separate branches to keep track of
heads of the subprojects, there is nothing else other than the
working tree and the index file that needs to be updated when
switching branches.

Merging
-------

Merging two branches of the toplevel projects can use the
traditional merging mechanism mostly unchanged.  The merge base
computation can be done using the `parent` ancestry information
taken from the two toplevel project branch heads being merged,
and merging of the whole tree can be done with a three-way merge
of the whole tree using the merge base and two head commits.
For reasons described later, we would not merge the subproject
parts of the trees during this step, though.

When the two branch heads use different versions of subproject,
things get a bit tricky.  First, let's forget for a moment about
the case where they bind the same project at different location.
We would refuse if they do not have the same number of `bind`
lines that bind something at the same subdirectories.

------------
$ git merge 'Merge in a side branch' HEAD side
error: the merged heads have subprojects bound at different places.
 ours:
	linux-2.6/
	appliance/
 theirs:
	kernel/
	gadget/
	manual/
------------

Such renaming can be handled by first moving the bind points in
our branch, and redoing the merge (this is a rare operation
anyway).  It might go like this:

------------
$ git bind-projects \
	$kernel_commit kernel/ \
	$gadget_commit gadget/
$ git commit -m 'Prepare for merge with side branch'
$ git merge 'Merge in a side branch' HEAD side
error: the merged heads have subprojects bound at different places.
 ours:
	kernel/
	gadget/
 theirs:
	kernel/
	gadget/
	manual/
------------

Their branch added another subproject, so this did not work (or
it could be the other way around -- we might have been the one
with `manual/` subproject while they didn't).  This suggests
that we may want an option to `git merge` to allow taking a
union of subprojects.  Again, this is a rare operation, and
always taking a union would have created a toplevel project that
had both `kernel/` and `linux-2.6/` bound to the same Linux
kernel project from possibly different vintage, so it would be
prudent to require the set of bound subprojects to exactly match
and give the user an option to take a union.

------------
$ git merge --union-subprojects 'Merge in a side branch HEAD side
error: the subproject at `kernel/` needs to be merged first.
------------

Here, the version of the Linux kernel project in the `side`
branch was different from what our branch had on our `bind`
line.  On what kind of difference should we give this error?
Initially, I think we could require one is the fast forward of
the other (ours might be ahead of theirs, or the other way
around), and take the descendant.

Or we could do an independent merge of subprojects heads, using
the `parent` ancestry of the bound subproject heads to find
their merge-base and doing a three-way merge.  This would leave
the merge result in the subproject part of the working tree and
the index.

[NOTE]
This is the reason we did not do the whole-tree three way merge
earlier.  The subproject commit bound to the merge base commit
used for the toplevel project may not be the merge base between
the subproject commits bound to the two toplevel project
commits.

So let's deal with the case to merge only a subproject part into
our tree first.

Merging subprojects
-------------------

An operation of more practical importance is to be able to merge
in changes done outside to the projects bound to our toplevel
project.

------------
$ git pull --subproject=kernel/ git://git.kernel.org/.../linux-2.6/
------------

might do:

. fetch the current `HEAD` commit from Linus.
. find the subproject commit bound at kernel/ subtree.
. perform the usual three-way merge of these two commits, in
  `kernel/` part of the working tree.

After that, `git commit \--subproject` option would be needed to
make a commit.

[NOTE]
This suggests that we would need to have something similar to
`MERGE_HEAD` for merging the subproject part.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  1:35 Notes on Subproject Support Junio C Hamano
@ 2006-01-23  3:50 ` Daniel Barkalow
  2006-01-23  4:36   ` Junio C Hamano
  2006-01-23  8:00 ` Junio C Hamano
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Daniel Barkalow @ 2006-01-23  3:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Petr Baudis

On Sun, 22 Jan 2006, Junio C Hamano wrote:

> Also note that in this round of proposal, there is no separate
> branches that keep track of heads of subprojects.

Interesting; I think it may become useful to allow for such heads, but we 
can deal with that when it arises. (e.g., maybe you want to use topic 
branches in the kernel development you do in the linux-2.6/ subdirectory 
of your superproject working tree; so long as the core isn't using refs 
for its own purposes, this is up to the user to keep straight and we can 
help later when we have usage notes)

> ============
> 
> Let's not forget to add the `Makefile`, and check the whole
> thing out from the index file.
> ------------
> $ git add Makefile

Maybe bind-projects should be "add-projects", to match "add", which has a 
similar effect at the user level?

> $ git checkout-index -f -u -q -a
> ------------
> 
> Now our directory should be identical with the `current`
> directory.  After making sure of that, we should be able to
> commit the whole thing:
> 
> ------------
> $ diff -x .git -r ../current ../combined
> $ git commit -m 'Initial toplevel project commit'
> ------------
> 
> Which should create a new commit object that records what is in
> the index file as its tree, with `bind` lines to record which
> subproject commit objects are bound at what subdirectory, and
> updates the `$GIT_DIR/refs/heads/master`.  Such a commit object
> might look like this:
> ------------
> tree 04803b09c300c8325258ccf2744115acc4c57067

Does this tree include trees for the bound projects?

> bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/
> bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/
> author Junio C Hamano <junio@kernel.org> 1137965565 -0800
> committer Junio C Hamano <junio@kernel.org> 1137965565 -0800
> 
> Initial toplevel project commit
> ------------
> 
> 
> Making further commits
> ----------------------
> 
> The easiest case is when you updated the Makefile without
> changing anything in the subprojects.  In such a case, we just
> need to create a new commmit object that records the new tree
> with the current `HEAD` as its parent, and with the same set of
> `bind` lines.
> 
> When we have changes to the subproject part, we would make a
> separate commit to the subproject part and then record the whole
> thing by making a commit to the toplevel project.  The user
> interaction might go this way:
> ------------
> $ git commit
> error: you have changes to the subproject bound at linux-2.6/.
> $ git commit --subproject linux-2.6/
> $ git commit
> ------------

I think "cd linux-2.6 && git commit" should work for the subproject, too, 
but that can be a later enhancement.

> With the new `--subproject` option, the directory structure
> rooted at `linux-2.6/` part is written out as a tree, and a new
> commit object that records that tree object with the commit
> bound to that portion of the tree (`5b2bcc7b` in the above
> example) as its parent is created.

And the commit is written to the index, in the special slot for the 
subproject, replacing its parent, I assume.

> Switching branches
> ------------------
> 
> Along with the traditional two-way merge by `read-tree -m -u`,
> we would need to look at:
> 
> . `bind` lines in the current `HEAD` commit.
> 
> . `bind` lines in the commit we are switching to.
> 
> . subproject binding information in the index file.
> 
> to make sure we do sensible things.

This is one place I think storing the bindings in the commit is awkward. 
read-tree deals in trees (hence the name), but will need information from 
the commit.

I think it should be possible to hide the existance of subtrees in an 
add-on to the struct tree API such that code that doesn't handle it 
specifically doesn't see a difference, similarly to how the index file can 
be handled. (parse_tree would fill out the structure as if the subproject 
were a tree instead of a commit, assuming that the structure it's 
pretending to be is the full tree, but there would be an additional 
field for the commit if it's a subproject, until we've gone through 
everything to make it work with subprojects).

I'm hoping to kill off the other tree object parser, which is only used by 
ls-tree and diff-index at this point, but my workstation's home directory 
hard drive seems to have gotten weirdly messed up at the hardware level 
(and seems to have lost a lot of the contents of unused storage, or 
something), so this may take a little while. At that point, whatever 
special things we do in tree objects can be handled automatically with 
changes only to a single location.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  3:50 ` Daniel Barkalow
@ 2006-01-23  4:36   ` Junio C Hamano
  2006-01-23  5:48     ` Junio C Hamano
  2006-01-23 16:31     ` Daniel Barkalow
  0 siblings, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-23  4:36 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

>> tree 04803b09c300c8325258ccf2744115acc4c57067
>
> Does this tree include trees for the bound projects?

Yes, this part has not been changed from earlier thoughts.

>> bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/
>> bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/
>> author Junio C Hamano <junio@kernel.org> 1137965565 -0800
>> committer Junio C Hamano <junio@kernel.org> 1137965565 -0800

The tree 04803b...  tree has everything.  If you run git-ls-tree
on 04803b... would have a tree object recorded at linux-2.6, and
it is the same as the tree associated with the commit 5b2bcc...

> I think "cd linux-2.6 && git commit" should work for the subproject, too, 
> but that can be a later enhancement.

It's just a matter of Porcelain scripting, so that is probably
true.  However I do not want people to get too used to it and
expect "cd Documentation && git commit" to work in git.git
repository.

>> With the new `--subproject` option, the directory structure
>> rooted at `linux-2.6/` part is written out as a tree, and a new
>> commit object that records that tree object with the commit
>> bound to that portion of the tree (`5b2bcc7b` in the above
>> example) as its parent is created.
>
> And the commit is written to the index, in the special slot for the 
> subproject, replacing its parent, I assume.

Yes.  It would probably be done with `update-index --bind` to
update the bound subproject commit there.

>> Switching branches
>> ------------------
>> 
>> Along with the traditional two-way merge by `read-tree -m -u`,
>> we would need to look at:
>> 
>> . `bind` lines in the current `HEAD` commit.
>> 
>> . `bind` lines in the commit we are switching to.
>> 
>> . subproject binding information in the index file.
>> 
>> to make sure we do sensible things.
>
> This is one place I think storing the bindings in the commit is awkward. 
> read-tree deals in trees (hence the name), but will need information from 
> the commit.

That's why it is 'along with'.  Dealing with binding information
can be done between commits and index without bothering tree
objects.  read-tree would not have to deal with it, and I think
keeping it that way is probably a good idea.

In other words, I think the design so far does not require us to
touch tree objects at all, and I'd be happy if we do not have to.

One reason I started the bound commit approach was exactly
because I only needed to muck with commit objects and did not
have to touch trees and blobs; after trying to implement the
core level for "gitlink", which I ended up touching quite a lot
and have abandoned for now.

Here is an update to the still-WIP draft.

-- >8 --
Separate role of read-tree and update-index cleaner

The previous draft prematurely merged what read-tree --prefix
does with what update-index --bind would do.  Keep them separate
for now until we know what the common patterns would be.

Introduce 'update-index --unbind'.  We would probably need a new
command that extracts bind information out of index when we
start writing Porcelainish, but it is not specified yet.

Attempt to clarify what the "merging into subproject part" would
do a bit.  "git pull --subproject=" is fetch + merge, just like
the current subproject-unaware 'git pull' is.

---
diff --git a/Subpro.txt b/Subpro.txt
index 4036e71..837cab8 100644
--- a/Subpro.txt
+++ b/Subpro.txt
@@ -95,19 +95,22 @@ $ git bind-projects \
 	$gadget_commit appliance/
 ------------
 
-This would do an equivalent of:
+This would probably do an equivalent of:
 
 ------------
+$ rm -f "$GIT_DIR/index"
 $ git read-tree --prefix=linux-2.6/ $kernel_commit
 $ git read-tree --prefix=appliance/ $gadget_commit
+$ git update-index --bind linux-2.6/ $kernel_commit
+$ git update-index --bind appliance/ $gadget_commit
 ------------
 [NOTE]
 ============
 Earlier outlines sent to the git mailing list talked
 about `$GIT_DIR/bind` to record what subproject are bound to
-which subtree in the curent working tree and index.  This
+which subtree in the current working tree and index.  This
 proposal instead records that information in the index file
-when `--prefix=linux-2.6/` is given to `read-tree`.
+with `update-index --bind` command.
 
 Also note that in this round of proposal, there is no separate
 branches that keep track of heads of subprojects.
@@ -258,9 +261,11 @@ our branch, and redoing the merge (this 
 anyway).  It might go like this:
 
 ------------
-$ git bind-projects \
-	$kernel_commit kernel/ \
-	$gadget_commit gadget/
+$ git reset
+$ git update-index --unbind linux-2.6/
+$ git update-index --unbind appliance/
+$ git update-index --bind $kernel_commit kernel/
+$ git update-index --bind $gadget_commit gadget/
 $ git commit -m 'Prepare for merge with side branch'
 $ git merge 'Merge in a side branch' HEAD side
 error: the merged heads have subprojects bound at different places.
@@ -336,7 +341,55 @@ make a commit.
 
 [NOTE]
 This suggests that we would need to have something similar to
-`MERGE_HEAD` for merging the subproject part.
+`MERGE_HEAD` for merging the subproject part.  In the case of
+merging two toplevel project commits, we probably can read the
+`bind` lines from the `MERGE_HEAD` commit and either our `HEAD`
+commit or our index file.  Further, we probably would require
+that the latter two must match, just as we currently require the
+index file matches our `HEAD` commit before `git merge`.
 
+Just like the current `pull = fetch + merge` semantics, the
+subproject aware version `git pull \--subproject=frotz` would be
+a `git fetch \--subproject=frotz` followed by a `git merge
+\--subproject=frotz`.  So the above would be:
 
+. Fetch the head.
++
+------------
+$ git fetch --subproject=kernel/ git://git.kernel.org/.../linux-2.6/
+------------
++
+which would do:
+. fetch the commit chain from the remote repository.
+. write something like this to `FETCH_HEAD`:
++
+------------
+3ee68c4...\tfor-merge-into kernel/\tbranch 'master' of git://.../linux-2.6
+------------
+
+. Run `git merge`.
++
+------------
+$ git merge --subproject=kernel/ \
+    'Merge git://.../linux-2.6 into kernel/' HEAD 3ee68c4...
+------------
+
+. In case it does not cleanly automerge, `git merge` would write
+the necessary information for a later `git commit` to use in
+`MERGE_HEAD`.  It may look like this:
++
+------------
+3ee68c4af3fd7228c1be63254b9f884614f9ebb2	kernel/
+------------
+
+With this, a later invocation of `git commit` to record the
+result of hand resolving would be able to notice that:
+
+. We should be first resolving `kernel/` subproject.
+. The remote `HEAD` is `3ee68c4...` commit.
+. The merge message is `Merge git://.../linux-2.6 into kernel/`.
 
+and make a merge commit, and register that resulting commit in
+the index file using `update-index --bind` instead of updating
+*any* branch head (remember, we do not use separate branches to
+keep track of subproject heads anymore).

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  4:36   ` Junio C Hamano
@ 2006-01-23  5:48     ` Junio C Hamano
  2006-01-23  6:06       ` Alexander Litvinov
                         ` (2 more replies)
  2006-01-23 16:31     ` Daniel Barkalow
  1 sibling, 3 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-23  5:48 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Petr Baudis

Junio C Hamano <junkio@cox.net> writes:

> In other words, I think the design so far does not require us to
> touch tree objects at all, and I'd be happy if we do not have to.
>
> One reason I started the bound commit approach was exactly
> because I only needed to muck with commit objects and did not
> have to touch trees and blobs; after trying to implement the
> core level for "gitlink", which I ended up touching quite a lot
> and have abandoned for now.

BTW, let's digress a bit.

I think recording "commit" in the tree objects is in line with
the logical organization described in README: "blob" and "tree"
represent a state, and have *nothing* to do with how we came
about to that state.  The historyh is described in "commit"
objects.  The bound commit approach keeps that property.

The "gitlink" approach, as I understand how Linus outlined in
his original suggestion, is a bit different.  The link objects
appear in tree objects, and when you "git cat-file link" one of
them, you would see something like this:

        commit	5b2bcc7b2d546c636f79490655b3347acc91d17f
        name	kernel

So in that sense, "gitlink" approach departs from the original
premise of "commit" being the only thing that ties things
together.  Tree objects with "gitlink" know where they are in
the history [*1*].

By this, I do not mean to say that "gitlink" approach is
inferior because it breaks that original premise.  I am just
pointing it out as a difference between two approaches.

Now, the current way index file is used is as a staging area to
create a new commit on top of the tip of the current branch.
However, it is interesting to note that logically, by itself
*alone*, it cannot be used that way.  The information the index
file records is something that can be used to write out a tree
object, and not a commit, because it does not know where the
current state sits in the history.  We have two separate files,
$GIT_DIR/HEAD that records which branch we are on, and the
branch head ref the HEAD points at, which records where the
current index came from, for that purpose.  The latter tells us
what commit we should use as the parent commit if we create such
a commit, and the former tells us which branch head to update
once we create one.  So in that sense, the index file is just a
staging area to create a new tree, not a new commit.

We could have done things differently.  I am not advocating to
do the following change, but offering a possibility as a thought
experiment.  It just felt interesting enough to point them out.

The index file could have recorded what commit the current state
recorded in the index came from.  By recording the commit the
index was read from in the index itself, independently from the
$GIT_DIR/refs/heads/$branch file, we could have been able to
allow fetching into the current branch.  When the $branch file
for the current branch was updated by a fast-forward fetch, we
would notice that the commit recorded there no longer match what
is recorded in the index.

Another interesting consequence is if the development is a
single repository and linear, we did not even need any file in
$GIT_DIR/refs/ ("branchless git").  The commit recorded as the
topmost in the index file itself would have served as the tip of
the development, and we would have been able to tangle the
history starting from the commit in the index file.

While we are doing a thought experiment, let's say we allow to
record more than one commits the current index is based upon.
'git merge' would record all the parent commits there, so that
writing out the merge result out of the index file as a tree and
then recording these commits as parents would have been the way
to create a merge commit.  We would not need the auxiliary file
$GIT_DIR/MERGE_HEAD if we did so.

In other words, if the index file recorded the commits its
contents were based upon, instead of being a staging area for a
new tree, it would have been a staging area for a new commit.

Now, the latest proposal, borrowing your idea, records the
subproject commits bound to subdirectories in the index itself.
This is halfway to make the index file a staging area for the
next commit.  If we were to do that, we also *could* record the
commits the current index is based upon, so that it can truly be
used as a staging area to create a new commit, not just a tree.

On the other hand, this could be a reason *not* to do the
`update-index --bind` to record the subproject information in
the index file.  An auxiliary file such as $GIT_DIR/bind might
be sufficient, just like $GIT_DIR/MERGE_HEAD has been good
enough for us so far.  One difference between MERGE_HEAD and
bind is that the former is very transient -- only exists during
a merge while the latter is persistent while the top commit is
checked out and being worked on.

[Footnote]

*1* One good property of "gitlink" approach is that we *could*
extend this blob-like object to store arbitrary human readable
information to represent a point-in-time from an arbitrary
foreign SCM system.  IOW, we do not necessarily have to require
`commit` line that name a git commit to be there.  It could say
"Please slurp http://www.kernel.org/pub/software/.../git.tar.gz
and extract it in git/ directory".

Of course, for such a toplevel project commit, the tool may not
be able to do a checkout automatically and require the user to
cat-file the link, download a tarball and extract the subtree
there manually.

The bound commit approach requires you to have git commit object
names on the `bind` lines, and it is fundamentally much harder
to extend it to allow interfacing with foreign (non-)SCM
systems.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  5:48     ` Junio C Hamano
@ 2006-01-23  6:06       ` Alexander Litvinov
  2006-01-23 16:48         ` Daniel Barkalow
  2006-01-23  8:38       ` Junio C Hamano
  2006-01-23 17:57       ` Daniel Barkalow
  2 siblings, 1 reply; 17+ messages in thread
From: Alexander Litvinov @ 2006-01-23  6:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, git, Petr Baudis

In our development we use a little bit other case (it is simplified):
1. We have self written C++ library for linked list implementation. Lets call 
it liblist.
2. We have project A that use liblist as separate directory and project B that 
use this library too.

Currently we have 3 cvs projects with cvs-modules for linking liblist to A and 
B. During development of A and B we often modify liblist to fix bugs and 
these changes are immidatly visible to all projects who use liblist.

After full implementation of bind functionality I see one restriction: I have 
to use one repo for storing all three projects: A, B and liblist to make 
changes of liblist visible to all projects. The solution is to make separate 
repos and on each change of liblist in prokect A push these changes to 
liblist repo and pull them into project B again - bit hacky solution.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  1:35 Notes on Subproject Support Junio C Hamano
  2006-01-23  3:50 ` Daniel Barkalow
@ 2006-01-23  8:00 ` Junio C Hamano
  2006-01-23 12:50 ` Martin Atukunda
  2006-01-28  4:55 ` Horst von Brand
  3 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-23  8:00 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> writes:

> This is still a draft/WIP, but "release early" is a good
> discipline, so...

Tentatively I'm placing this document in the 'todo' branch, so
that people interested in the changes can ask gitweb to show
diffs, until I can find a better way and location to manage it.

I do not think it is suitable to be in the Documentation/ area,
due to its being an early draft and its just-one-ofthe-proposals
status.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  5:48     ` Junio C Hamano
  2006-01-23  6:06       ` Alexander Litvinov
@ 2006-01-23  8:38       ` Junio C Hamano
  2006-01-23 17:57       ` Daniel Barkalow
  2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-23  8:38 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> writes:

> BTW, let's digress a bit.

Ugh.  Serious typo.

> I think recording "commit" in the tree objects is in line with
> the logical organization described in README: "blob" and "tree"
> represent a state, and have *nothing* to do with how we came
> about to that state.  The historyh is described in "commit"
> objects.  The bound commit approach keeps that property.

Obviously, I think "*NOT* recording commit in tree objects" is
in line with "blobs and trees are about states, commits give
them their points in history".

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  1:35 Notes on Subproject Support Junio C Hamano
  2006-01-23  3:50 ` Daniel Barkalow
  2006-01-23  8:00 ` Junio C Hamano
@ 2006-01-23 12:50 ` Martin Atukunda
  2006-01-23 19:30   ` Junio C Hamano
  2006-01-28  4:55 ` Horst von Brand
  3 siblings, 1 reply; 17+ messages in thread
From: Martin Atukunda @ 2006-01-23 12:50 UTC (permalink / raw)
  To: git

This proposal doesn't seem to cator for the event when a directory is
renamed or moved to a different location, or am I missing something?

- Martin -
On Sun, Jan 22, 2006 at 05:35:14PM -0800, Junio C Hamano wrote:
> This is still a draft/WIP, but "release early" is a good
> discipline, so...
> 
> -- >8 --
> 
> Notes on Subproject Support
> ===========================
> Junio C Hamano <junkio@cox.net>
> v1.0 January 22, 2006
> 
> Scenario
> --------
> 
> The examples in the following discussion show how this proposal
> plans to help this:
> 
> . A project to build an embedded Linux appliance "gadget" is
>   maintained with git.
> 
> . The project uses linux-2.6 kernel as its subcomponent.  It
>   starts from a particular version of the mainline kernel, but
>   adds its own code and build infrastructure to fit the
>   appliance's needs.
> 
> . The working tree of the project is laid out this way:
> +
> ------------
>  Makefile       - Builds the whole thing.
>  linux-2.6/     - The kernel, perhaps modified for the project.
>  appliance/     - Applications that run on the appliance, and
>                   other bits.
> ------------
> 
> . The project is willing to maintain its own changes out of tree
>   of the Linux kernel project, but would want to be able to feed
>   the changes upstream, and incorporate upstream changes to its
>   own tree, taking advantage of the fact that both itself and
>   the Linux kernel project are version controlled with git.
> 
> The idea here is to:
> 
> . Keep `linux-2.6/` part as an independent project.  The work by
>   the project on the kernel part can be naturally exchanged with
>   the other kernel developers this way.  Specifically, a tree
>   object contained in commit objects belonging to this project
>   does *not* have linux-2.6/ directory at the top.
> 
> . Keep the `appliance/` part as another independent project.
>   Applications are supposed to be more or less independent from
>   the kernel version, but some other bits might be tied to a
>   specific kernel version.  Again, a tree object contained in
>   commit objects belonging to this project does *not* have
>   appliance/ directory at the top.
> 
> . Have another project that combines the whole thing together,
>   so that the project can keep track of which versions of the
>   parts are built together.
> 
> We will call the project that binds things together the
> 'toplevel project'.  Other projects that hold `linux-2.6/` part
> and `appliance/` part are called 'subprojects'.
> 
> Notice that `Makefile` at the top is part of the toplevel
> project in this example, but it is not necessary.  We could
> instead have the appliance subproject include this file.  In
> such a setup, the appliance subproject would have had `Makefile`
> and `appliance/` directory at the toplevel.
> 
> 
> Setting up
> ----------
> 
> Let's say we have been working on the appliance software,
> independently version controlled with git.  Also the kernel part
> has been version controlled separately, like this:
> ------------
> $ ls -dF current/*/.git current/*
> current/Makefile    current/appliance/.git/  current/linux-2.6/.git/
> current/appliance/  current/linux-2.6/
> ------------
> 
> Now we would want to get a combined project.  First we would
> clone from these repositories (which is not strictly needed --
> we could use `$GIT_ALTERNATE_OBJECT_DIRECTORIES` instead):
> 
> ------------
> $ mkdir combined && cd combined
> $ cp ../current/Makefile .
> $ git init-db
> $ mkdir -p .git/refs/subs/{kernel,gadget}/{heads,tags}
> $ git clone-pack ../current/linux-2.6/ master | read kernel_commit junk
> $ git clone-pack ../current/appliance/ master | read gadget_commit junk
> ------------
> 
> We will introduce a new command to set up a combined project:
> 
> ------------
> $ git bind-projects \
> 	$kernel_commit linux-2.6/ \
> 	$gadget_commit appliance/
> ------------
> 
> This would do an equivalent of:
> 
> ------------
> $ git read-tree --prefix=linux-2.6/ $kernel_commit
> $ git read-tree --prefix=appliance/ $gadget_commit
> ------------
> [NOTE]
> ============
> Earlier outlines sent to the git mailing list talked
> about `$GIT_DIR/bind` to record what subproject are bound to
> which subtree in the curent working tree and index.  This
> proposal instead records that information in the index file
> when `--prefix=linux-2.6/` is given to `read-tree`.
> 
> Also note that in this round of proposal, there is no separate
> branches that keep track of heads of subprojects.
> ============
> 
> Let's not forget to add the `Makefile`, and check the whole
> thing out from the index file.
> ------------
> $ git add Makefile
> $ git checkout-index -f -u -q -a
> ------------
> 
> Now our directory should be identical with the `current`
> directory.  After making sure of that, we should be able to
> commit the whole thing:
> 
> ------------
> $ diff -x .git -r ../current ../combined
> $ git commit -m 'Initial toplevel project commit'
> ------------
> 
> Which should create a new commit object that records what is in
> the index file as its tree, with `bind` lines to record which
> subproject commit objects are bound at what subdirectory, and
> updates the `$GIT_DIR/refs/heads/master`.  Such a commit object
> might look like this:
> ------------
> tree 04803b09c300c8325258ccf2744115acc4c57067
> bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/
> bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/
> author Junio C Hamano <junio@kernel.org> 1137965565 -0800
> committer Junio C Hamano <junio@kernel.org> 1137965565 -0800
> 
> Initial toplevel project commit
> ------------
> 
> 
> Making further commits
> ----------------------
> 
> The easiest case is when you updated the Makefile without
> changing anything in the subprojects.  In such a case, we just
> need to create a new commmit object that records the new tree
> with the current `HEAD` as its parent, and with the same set of
> `bind` lines.
> 
> When we have changes to the subproject part, we would make a
> separate commit to the subproject part and then record the whole
> thing by making a commit to the toplevel project.  The user
> interaction might go this way:
> ------------
> $ git commit
> error: you have changes to the subproject bound at linux-2.6/.
> $ git commit --subproject linux-2.6/
> $ git commit
> ------------
> 
> With the new `--subproject` option, the directory structure
> rooted at `linux-2.6/` part is written out as a tree, and a new
> commit object that records that tree object with the commit
> bound to that portion of the tree (`5b2bcc7b` in the above
> example) as its parent is created.  Then the final `git commit`
> would record the whole tree with updated `bind` line for the
> `linux-2.6/` part.
> 
> 
> Checking out
> ------------
> 
> After cloning such a toplevel project, `git clone` without `-n`
> option would check out the working tree.  This is done by
> reading the tree object recorded in the commit object (which
> records the whole thing), and adding the information from the
> "bind" line to the index file.
> 
> ------------
> $ cd ..
> $ git clone -n combined cloned ;# clone the one we created earlier
> $ cd cloned
> $ git checkout
> ------------
> 
> This round of proposal does not maintain separate branch heads
> for subprojects.  The bound commits and their subdirectories
> are recorded in the index file from the commit object, so there
> is no need to do anything other than updating the index and the
> working tree.
> 
> 
> Switching branches
> ------------------
> 
> Along with the traditional two-way merge by `read-tree -m -u`,
> we would need to look at:
> 
> . `bind` lines in the current `HEAD` commit.
> 
> . `bind` lines in the commit we are switching to.
> 
> . subproject binding information in the index file.
> 
> to make sure we do sensible things.
> 
> Just like until very recently we did not allow switching
> branches when two-way merge would lose local changes, we can
> start by refusing to switch branches when the subprojects bound
> in the index do not match what is recorded in the `HEAD` commit.
> 
> Because in this round of the proposal we do not use the
> `$GIT_DIR/bind` file nor separate branches to keep track of
> heads of the subprojects, there is nothing else other than the
> working tree and the index file that needs to be updated when
> switching branches.
> 
> 
> Merging
> -------
> 
> Merging two branches of the toplevel projects can use the
> traditional merging mechanism mostly unchanged.  The merge base
> computation can be done using the `parent` ancestry information
> taken from the two toplevel project branch heads being merged,
> and merging of the whole tree can be done with a three-way merge
> of the whole tree using the merge base and two head commits.
> For reasons described later, we would not merge the subproject
> parts of the trees during this step, though.
> 
> When the two branch heads use different versions of subproject,
> things get a bit tricky.  First, let's forget for a moment about
> the case where they bind the same project at different location.
> We would refuse if they do not have the same number of `bind`
> lines that bind something at the same subdirectories.
> 
> ------------
> $ git merge 'Merge in a side branch' HEAD side
> error: the merged heads have subprojects bound at different places.
>  ours:
> 	linux-2.6/
> 	appliance/
>  theirs:
> 	kernel/
> 	gadget/
> 	manual/
> ------------
> 
> Such renaming can be handled by first moving the bind points in
> our branch, and redoing the merge (this is a rare operation
> anyway).  It might go like this:
> 
> ------------
> $ git bind-projects \
> 	$kernel_commit kernel/ \
> 	$gadget_commit gadget/
> $ git commit -m 'Prepare for merge with side branch'
> $ git merge 'Merge in a side branch' HEAD side
> error: the merged heads have subprojects bound at different places.
>  ours:
> 	kernel/
> 	gadget/
>  theirs:
> 	kernel/
> 	gadget/
> 	manual/
> ------------
> 
> Their branch added another subproject, so this did not work (or
> it could be the other way around -- we might have been the one
> with `manual/` subproject while they didn't).  This suggests
> that we may want an option to `git merge` to allow taking a
> union of subprojects.  Again, this is a rare operation, and
> always taking a union would have created a toplevel project that
> had both `kernel/` and `linux-2.6/` bound to the same Linux
> kernel project from possibly different vintage, so it would be
> prudent to require the set of bound subprojects to exactly match
> and give the user an option to take a union.
> 
> ------------
> $ git merge --union-subprojects 'Merge in a side branch HEAD side
> error: the subproject at `kernel/` needs to be merged first.
> ------------
> 
> Here, the version of the Linux kernel project in the `side`
> branch was different from what our branch had on our `bind`
> line.  On what kind of difference should we give this error?
> Initially, I think we could require one is the fast forward of
> the other (ours might be ahead of theirs, or the other way
> around), and take the descendant.
> 
> Or we could do an independent merge of subprojects heads, using
> the `parent` ancestry of the bound subproject heads to find
> their merge-base and doing a three-way merge.  This would leave
> the merge result in the subproject part of the working tree and
> the index.
> 
> [NOTE]
> This is the reason we did not do the whole-tree three way merge
> earlier.  The subproject commit bound to the merge base commit
> used for the toplevel project may not be the merge base between
> the subproject commits bound to the two toplevel project
> commits.
> 
> So let's deal with the case to merge only a subproject part into
> our tree first.
> 
> 
> Merging subprojects
> -------------------
> 
> An operation of more practical importance is to be able to merge
> in changes done outside to the projects bound to our toplevel
> project.
> 
> ------------
> $ git pull --subproject=kernel/ git://git.kernel.org/.../linux-2.6/
> ------------
> 
> might do:
> 
> . fetch the current `HEAD` commit from Linus.
> . find the subproject commit bound at kernel/ subtree.
> . perform the usual three-way merge of these two commits, in
>   `kernel/` part of the working tree.
> 
> After that, `git commit \--subproject` option would be needed to
> make a commit.
> 
> [NOTE]
> This suggests that we would need to have something similar to
> `MERGE_HEAD` for merging the subproject part.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Due to a shortage of devoted followers, the production of great leaders has been discontinued.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  4:36   ` Junio C Hamano
  2006-01-23  5:48     ` Junio C Hamano
@ 2006-01-23 16:31     ` Daniel Barkalow
  2006-01-24  1:50       ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Barkalow @ 2006-01-23 16:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, 22 Jan 2006, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> >> Switching branches
> >> ------------------
> >> 
> >> Along with the traditional two-way merge by `read-tree -m -u`,
> >> we would need to look at:
> >> 
> >> . `bind` lines in the current `HEAD` commit.
> >> 
> >> . `bind` lines in the commit we are switching to.
> >> 
> >> . subproject binding information in the index file.
> >> 
> >> to make sure we do sensible things.
> >
> > This is one place I think storing the bindings in the commit is awkward. 
> > read-tree deals in trees (hence the name), but will need information from 
> > the commit.
> 
> That's why it is 'along with'.  Dealing with binding information
> can be done between commits and index without bothering tree
> objects.  read-tree would not have to deal with it, and I think
> keeping it that way is probably a good idea.

I think it would be a lot more fragile if switching branches requires 
multiple programs interacting with the index file. If things get 
interrupted after the tree is read but before the bindings are changed, 
the user will probably generate an inconsistant commit or have to deal 
with figuring out what's going on. It is a nice property of the current 
system that the index file never exists under the usual filename without 
being consistant.

> In other words, I think the design so far does not require us to
> touch tree objects at all, and I'd be happy if we do not have to.
>
> One reason I started the bound commit approach was exactly
> because I only needed to muck with commit objects and did not
> have to touch trees and blobs; after trying to implement the
> core level for "gitlink", which I ended up touching quite a lot
> and have abandoned for now.

I think that the same thing that worked with the index file would work for 
minimizing the impact of the changes as far as the code sees. If the 
struct tree parser reported the tree at a path when it found a commit at a 
path, it would work just as if the original tree had used trees (like in 
your proposal).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  6:06       ` Alexander Litvinov
@ 2006-01-23 16:48         ` Daniel Barkalow
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Barkalow @ 2006-01-23 16:48 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git, Petr Baudis

On Mon, 23 Jan 2006, Alexander Litvinov wrote:

> In our development we use a little bit other case (it is simplified):
> 1. We have self written C++ library for linked list implementation. Lets call 
> it liblist.
> 2. We have project A that use liblist as separate directory and project B that 
> use this library too.
> 
> Currently we have 3 cvs projects with cvs-modules for linking liblist to A and 
> B. During development of A and B we often modify liblist to fix bugs and 
> these changes are immidatly visible to all projects who use liblist.
> 
> After full implementation of bind functionality I see one restriction: I have 
> to use one repo for storing all three projects: A, B and liblist to make 
> changes of liblist visible to all projects. The solution is to make separate 
> repos and on each change of liblist in prokect A push these changes to 
> liblist repo and pull them into project B again - bit hacky solution.

We haven't yet discussed how pushing a repository with subprojects would 
work. We could probably have an extra line in the remotes/ file to make 
the desired thing happen automatically, so that you can just do "git push" 
and have the subproject go to its repository and the main project go to 
its repository.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  5:48     ` Junio C Hamano
  2006-01-23  6:06       ` Alexander Litvinov
  2006-01-23  8:38       ` Junio C Hamano
@ 2006-01-23 17:57       ` Daniel Barkalow
  2006-01-24  1:50         ` Junio C Hamano
  2 siblings, 1 reply; 17+ messages in thread
From: Daniel Barkalow @ 2006-01-23 17:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Petr Baudis

On Sun, 22 Jan 2006, Junio C Hamano wrote:

> Junio C Hamano <junkio@cox.net> writes:
> 
> BTW, let's digress a bit.
> 
> I think recording "commit" in the tree objects is in line with
> the logical organization described in README: "blob" and "tree"
> represent a state, and have *nothing* to do with how we came
> about to that state.  The historyh is described in "commit"
> objects.  The bound commit approach keeps that property.
>
> The "gitlink" approach, as I understand how Linus outlined in
> his original suggestion, is a bit different.  The link objects
> appear in tree objects, and when you "git cat-file link" one of
> them, you would see something like this:
> 
>         commit	5b2bcc7b2d546c636f79490655b3347acc91d17f
>         name	kernel
> 
> So in that sense, "gitlink" approach departs from the original
> premise of "commit" being the only thing that ties things
> together.  Tree objects with "gitlink" know where they are in
> the history [*1*].
> 
> By this, I do not mean to say that "gitlink" approach is
> inferior because it breaks that original premise.  I am just
> pointing it out as a difference between two approaches.

I think it's hard to say whether the history of a subproject is part of 
the state of the superproject or part of its history. They're certainly 
not the same history, because the superproject history may record that 
the superproject switched from one fork of a subproject to a different 
fork, or reverted the subproject to an earlier version, or other such 
things. (It's the whole data/metadata issue: when you take a step back, 
one level's data and metadata are all data, and there's more stuff that's 
metadata.)

I'd say that with either commits in trees or gitlink objects, it's still 
only the commits that tie things together; but some of the things that 
they tie together are opaquely things tied to other things. Tree objects 
with gitlink/commits don't know where *they* are in the history; they 
just happen to store things which know about a different history.

> Now, the current way index file is used is as a staging area to
> create a new commit on top of the tip of the current branch.
> However, it is interesting to note that logically, by itself
> *alone*, it cannot be used that way.  The information the index
> file records is something that can be used to write out a tree
> object, and not a commit, because it does not know where the
> current state sits in the history.  We have two separate files,
> $GIT_DIR/HEAD that records which branch we are on, and the
> branch head ref the HEAD points at, which records where the
> current index came from, for that purpose.  The latter tells us
> what commit we should use as the parent commit if we create such
> a commit, and the former tells us which branch head to update
> once we create one.  So in that sense, the index file is just a
> staging area to create a new tree, not a new commit.

Of course, we don't strictly need $GIT_DIR/HEAD to create a commit; 
that's only needed for what we generally do with the commit once we have 
it. We do need the branch head ref (or, more abstractly, the commit that 
we read to generate the index we modified) in order to create the commit.

> We could have done things differently.  I am not advocating to
> do the following change, but offering a possibility as a thought
> experiment.  It just felt interesting enough to point them out.
> 
> The index file could have recorded what commit the current state
> recorded in the index came from.  By recording the commit the
> index was read from in the index itself, independently from the
> $GIT_DIR/refs/heads/$branch file, we could have been able to
> allow fetching into the current branch.  When the $branch file
> for the current branch was updated by a fast-forward fetch, we
> would notice that the commit recorded there no longer match what
> is recorded in the index.

I actually think this would have been a good idea. I think we've had 
approximately every possible bug that could come from inconsistancy 
between the files that give the parents and the index file. (I think Linus 
didn't do it that way initially just because he was thinking of it as a 
cache, and there's little point in caching something small, and by the 
time we started looking at it as primary information on its own, we'd 
stopped thinking about what should go in it.)

> Another interesting consequence is if the development is a
> single repository and linear, we did not even need any file in
> $GIT_DIR/refs/ ("branchless git").  The commit recorded as the
> topmost in the index file itself would have served as the tip of
> the development, and we would have been able to tangle the
> history starting from the commit in the index file.

Well, you wouldn't be able to check out an old version and then return to 
the present without dredging the objects database for the dangling commit. 
My memory has gotten fuzzy, but I think HEAD may have originally been just 
a file, and we effectively had this (except that HEAD and the index were 
not the same file as far as the filesystem was concerned).

> While we are doing a thought experiment, let's say we allow to
> record more than one commits the current index is based upon.
> 'git merge' would record all the parent commits there, so that
> writing out the merge result out of the index file as a tree and
> then recording these commits as parents would have been the way
> to create a merge commit.  We would not need the auxiliary file
> $GIT_DIR/MERGE_HEAD if we did so.
>
> In other words, if the index file recorded the commits its
> contents were based upon, instead of being a staging area for a
> new tree, it would have been a staging area for a new commit.
> 
> Now, the latest proposal, borrowing your idea, records the
> subproject commits bound to subdirectories in the index itself.
> This is halfway to make the index file a staging area for the
> next commit.  If we were to do that, we also *could* record the
> commits the current index is based upon, so that it can truly be
> used as a staging area to create a new commit, not just a tree.
> 
> On the other hand, this could be a reason *not* to do the
> `update-index --bind` to record the subproject information in
> the index file.  An auxiliary file such as $GIT_DIR/bind might
> be sufficient, just like $GIT_DIR/MERGE_HEAD has been good
> enough for us so far.  One difference between MERGE_HEAD and
> bind is that the former is very transient -- only exists during
> a merge while the latter is persistent while the top commit is
> checked out and being worked on.

We've been able to make MERGE_HEAD work, but I remember there being 
problems even there when people tried to abandon merges by changing 
branches. Do you see an advantage to having the index only record the 
information used for making a tree, and keeping the information for making 
a commit in other files?

> [Footnote]
> 
> *1* One good property of "gitlink" approach is that we *could*
> extend this blob-like object to store arbitrary human readable
> information to represent a point-in-time from an arbitrary
> foreign SCM system.  IOW, we do not necessarily have to require
> `commit` line that name a git commit to be there.  It could say
> "Please slurp http://www.kernel.org/pub/software/.../git.tar.gz
> and extract it in git/ directory".
> 
> Of course, for such a toplevel project commit, the tool may not
> be able to do a checkout automatically and require the user to
> cat-file the link, download a tarball and extract the subtree
> there manually.
> 
> The bound commit approach requires you to have git commit object
> names on the `bind` lines, and it is fundamentally much harder
> to extend it to allow interfacing with foreign (non-)SCM
> systems.

I don't think this would really be useful. The reason to have the included 
revision stored in a way that's explicitly marked for git to find is that 
git can do useful things with the information (such as checking it out for 
you, but more importantly, making sure that changes to what revision 
you're working with propagate to changes in what revision you specify 
should be there). If the bound project is foreign, this clearly isn't 
going to happen, so there's not much point. For your example above, you 
could just have a regular file, "git/README", with the content "Please 
download http://.../git.tar.gz, and extract it here", and it would be at 
least as good.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23 12:50 ` Martin Atukunda
@ 2006-01-23 19:30   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-23 19:30 UTC (permalink / raw)
  To: Martin Atukunda; +Cc: git

Martin Atukunda <matlads@dsmagic.com> writes:

> This proposal doesn't seem to cator for the event when a directory is
> renamed or moved to a different location, or am I missing something?

First of all, please do not top post.

Second of all, please do not quote the whole thing.

Third of all, if you quote, please read the parts you quote.

>> Merging
>> -------
>> ...
>> Such renaming can be handled by first moving the bind points in
>> our branch, and redoing the merge (this is a rare operation
>> anyway).  It might go like this:
>> ...

This step describes how bind-point might be relocated prior to a
merge.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23 16:31     ` Daniel Barkalow
@ 2006-01-24  1:50       ` Junio C Hamano
  2006-01-24  4:22         ` Daniel Barkalow
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-01-24  1:50 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> I think it would be a lot more fragile if switching branches requires 
> multiple programs interacting with the index file. If things get 
> interrupted after the tree is read but before the bindings are changed, 
> the user will probably generate an inconsistant commit or have to deal 
> with figuring out what's going on. It is a nice property of the current 
> system that the index file never exists under the usual filename without 
> being consistant.

That is certainly an issue, which we have had already for quite
some time, I am afraid.  We can get interrupted during "switch
branches" flow after read-tree -u -m but before updating HEAD.
We can also get interrupted during "commit" flow after writing
the commit object out before updating the ref pointed at by
HEAD.  No?

If we are truly serious about solving the issue of getting
interrupted in the middle, I suspect we have to take the "index
is a staging area for the next commit" approach I digressed into
last night.  It would involve introducing a git-atomic-checkout
command to replace the current "git-rev-parse, git-read-tree,
then git-symbolic-ref" sequence in the checkout flow.  In the
commit flow, we would need git-commit-index command to replace
the current "git-write-tree, git-commit-tree, then
git-update-ref" sequence.

I am not particularly opposed to that, but I suspect it might be
a moderate amount of work for very little gain.  Continuing with
the digression, the updated index file may contain:

  1. list of <blob path, object name>
  2. list of parent commit object names for the next commit
  3. the name of the local branch to create the next commit on
  4. for each bound path:
     list of parent commit object names for that path.

1. is what we have in the current (version 2) index file.

2. contains:
 - 0 commit in an index file before the initial commit
 - 1 commit in an index file after a fresh checkout and records
   the commit object name we checked out (replaces HEAD+heads/$branch)
 - 2 commits in an index file after 3-way read-tree, or more
   during an Octopus merge (replaces HEAD+MERGE_HEAD)

3. may not be needed, but if we did so, it would replace HEAD.

4. is similar to 2 but for bound subprojects.  Usually we have
   one commit per bound path to record the "bind" line commit we
   read from the commit object after a fresh checkout.  During a
   subproject merge, we would:
   - start out with 1 commit read from the "bind" line;
   - merging in another subproject commit would add that commit;
   - when making a new subproject commit, the recorded commits
     are used as its parents.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23 17:57       ` Daniel Barkalow
@ 2006-01-24  1:50         ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-24  1:50 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> ... Do you see an advantage to having the index only record the 
> information used for making a tree, and keeping the information for making 
> a commit in other files?

If somebody else already did the work and presented me two git
implementations, one with the index file capable of generating a
tree and uses separate files to keep track of other information
for commits, and the other with the index file with everything
needed for a commit, I'd certainly take the latter.  In that
sense, I do not see such an advantage at all.  The practical
advantage of keeping them separate is to keep things simple,
minimizing the changes.  I see the subproject support as a
secondary issue, and so far I haven't found a reason convincing
enough to tell me that it is better to put HEAD+heads/$branch
information in the index itself when used in a subproject-less
setup.  It perhaps would make us more robust when interrupted in
the middle of switching branches or making a commit, but that is
about it (I do not particularly see that a serious problem).

>> *1* One good property of "gitlink" approach is that we *could*
>> extend this blob-like object to store arbitrary human readable
>> information to represent a point-in-time from an arbitrary
>> foreign SCM system.  IOW, we do not necessarily have to require
>> `commit` line that name a git commit to be there.  It could say
>> "Please slurp http://www.kernel.org/pub/software/.../git.tar.gz
>> and extract it in git/ directory".
>> ...
> I don't think this would really be useful. The reason to have the included 
> revision stored in a way that's explicitly marked for git to find is that 
> git can do useful things with the information ...
> but more importantly, making sure that changes to what revision 
> you're working with propagate to changes in what revision you specify 
> should be there)...

My example was taking things to the extreme to be illustrative.

To be more practical, it could have pointed at "git-1.0.tar.gz"
or an "svn://" URL with explicit revision name, which ought to
be enough to recreate the exact state.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-24  1:50       ` Junio C Hamano
@ 2006-01-24  4:22         ` Daniel Barkalow
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Barkalow @ 2006-01-24  4:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 23 Jan 2006, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > I think it would be a lot more fragile if switching branches requires 
> > multiple programs interacting with the index file. If things get 
> > interrupted after the tree is read but before the bindings are changed, 
> > the user will probably generate an inconsistant commit or have to deal 
> > with figuring out what's going on. It is a nice property of the current 
> > system that the index file never exists under the usual filename without 
> > being consistant.
> 
> That is certainly an issue, which we have had already for quite
> some time, I am afraid.  We can get interrupted during "switch
> branches" flow after read-tree -u -m but before updating HEAD.
> We can also get interrupted during "commit" flow after writing
> the commit object out before updating the ref pointed at by
> HEAD.  No?

The switch branches one is accurate, but I think that, if we get 
interrupted before updating the ref, the index will still be the same, and 
we'll just have a dangling object (which, if we commit the same thing 
again, will be the same object we generate).

I suppose the existing branch switching isn't much less bad than the new 
one would be, though. I sort of worry that rewriting the index file is 
more likely to be interrupted than updating a ref, but that's probably not 
really a significant difference.

> If we are truly serious about solving the issue of getting
> interrupted in the middle, I suspect we have to take the "index
> is a staging area for the next commit" approach I digressed into
> last night.  It would involve introducing a git-atomic-checkout
> command to replace the current "git-rev-parse, git-read-tree,
> then git-symbolic-ref" sequence in the checkout flow. 

Well, if the value of HEAD were in the index file, that would be 
sufficient to prevent anything actually bad from happening in the checkout 
path; if it gets interrupted, the index file's "current commit" field 
would then not match the ref and it would be clear that the system was in 
an intermediate state. (It would appear like if you'd fetched into the 
current branch without it doing the fast-forward.)

> In the commit flow, we would need git-commit-index command to replace
> the current "git-write-tree, git-commit-tree, then
> git-update-ref" sequence.

I don't think there's an issue here, anyway.

> I am not particularly opposed to that, but I suspect it might be
> a moderate amount of work for very little gain.  Continuing with
> the digression, the updated index file may contain:
> 
>   1. list of <blob path, object name>
>   2. list of parent commit object names for the next commit
>   3. the name of the local branch to create the next commit on
>   4. for each bound path:
>      list of parent commit object names for that path.
> 
> 1. is what we have in the current (version 2) index file.
> 
> 2. contains:
>  - 0 commit in an index file before the initial commit
>  - 1 commit in an index file after a fresh checkout and records
>    the commit object name we checked out (replaces HEAD+heads/$branch)
>  - 2 commits in an index file after 3-way read-tree, or more
>    during an Octopus merge (replaces HEAD+MERGE_HEAD)
> 
> 3. may not be needed, but if we did so, it would replace HEAD.

I don't think it would be needed; it could certainly be passed in. 
Actually not having HEAD would complicate a lot of programs that use HEAD 
but don't currently read the index (and don't actually care about whether 
you have the branch that you consider current actually checked out).

> 4. is similar to 2 but for bound subprojects.  Usually we have
>    one commit per bound path to record the "bind" line commit we
>    read from the commit object after a fresh checkout.  During a
>    subproject merge, we would:
>    - start out with 1 commit read from the "bind" line;
>    - merging in another subproject commit would add that commit;
>    - when making a new subproject commit, the recorded commits
>      are used as its parents.

Right.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-23  1:35 Notes on Subproject Support Junio C Hamano
                   ` (2 preceding siblings ...)
  2006-01-23 12:50 ` Martin Atukunda
@ 2006-01-28  4:55 ` Horst von Brand
  2006-01-28 21:43   ` Junio C Hamano
  3 siblings, 1 reply; 17+ messages in thread
From: Horst von Brand @ 2006-01-28  4:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Daniel Barkalow, Petr Baudis

Junio C Hamano <junkio@cox.net> wrote:
> This is still a draft/WIP, but "release early" is a good
> discipline, so...

One thing that has bugged me from the beginning of this, and which does
come out of your example: Why only project/subproject? In your example, you
have the kernel (OK(ish)) and "rest of the world", which could itself break
up and be tracking e.g. uClibc, and dhcp, and... And perhaps the kernel
itself breaks up into (local and vanilla) components.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Notes on Subproject Support
  2006-01-28  4:55 ` Horst von Brand
@ 2006-01-28 21:43   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-01-28 21:43 UTC (permalink / raw)
  To: Horst von Brand; +Cc: git, Daniel Barkalow, Petr Baudis

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

> One thing that has bugged me from the beginning of this, and which does
> come out of your example: Why only project/subproject? In your example, you
> have the kernel (OK(ish)) and "rest of the world",...

Because I presented the example badly, perhaps?

There is nothing that prevents you from having more "bind" lines
than the example showed, to have one project that works with N
subprojects.  In fact, the examples in earlier threads used a
project with the kernel and gcc subprojects -- I just felt it
was so obvious you can do N subprojects instead of just one, so
used just one subproject in the latest round of example for the
sake of brevity.

And there is nothing that prevents you from having "bind" lines
in the subproject commit objects, either.

The structure the lower level objects support with the "bound
commit" extension is not about "project vs subproject".  You can
express "project that has subprojects each of which has
subsubprojects".

Now, it is totally a separate issue that anybody sane would want
to keep track of such structure, or we would be better off
leaving it to build infrastructure specific to each toplevel
project, as argued by some earlier.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-01-28 21:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-23  1:35 Notes on Subproject Support Junio C Hamano
2006-01-23  3:50 ` Daniel Barkalow
2006-01-23  4:36   ` Junio C Hamano
2006-01-23  5:48     ` Junio C Hamano
2006-01-23  6:06       ` Alexander Litvinov
2006-01-23 16:48         ` Daniel Barkalow
2006-01-23  8:38       ` Junio C Hamano
2006-01-23 17:57       ` Daniel Barkalow
2006-01-24  1:50         ` Junio C Hamano
2006-01-23 16:31     ` Daniel Barkalow
2006-01-24  1:50       ` Junio C Hamano
2006-01-24  4:22         ` Daniel Barkalow
2006-01-23  8:00 ` Junio C Hamano
2006-01-23 12:50 ` Martin Atukunda
2006-01-23 19:30   ` Junio C Hamano
2006-01-28  4:55 ` Horst von Brand
2006-01-28 21:43   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.