RFC: Subprojects

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: Subprojects
@ 2006-01-11 15:58 Simon Richter
  2006-01-11 16:44 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Simon Richter @ 2006-01-11 15:58 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]

Hello,

one thing that I have been missing so far in all SCM systems apart from 
CVS (and there it's just coincidence) is the ability to include a 
project as part of a bigger project. Developing software for embedded 
systems, I need that feature fairly often, for example the source tree 
for a particular device almost always contains one or more Linux trees, 
some binutils, gcc and gdb stuff and so on.

The changes necessary here would be fairly simple: "tree" objects would 
point to a "commit" or a "tag" object when a subproject is used.

In the working directory, this would be represented by a .git directory 
that contains a symref to the embedding project instead of the objects 
directory. Head pointers are only required if you intend to push changes 
upstream to the maintainer of the embedded project. Each subproject has 
its own index.

Would such a feature make sense, and what behaviour would make the most 
sense for the various operations (e.g. shall commits in the inner 
project propagate to the outer?)?

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 15:58 RFC: Subprojects Simon Richter
@ 2006-01-11 16:44 ` Johannes Schindelin
  2006-01-11 16:52   ` Simon Richter
  2006-01-12  3:19 ` Alexander Litvinov
  2006-01-15 15:07 ` [RFC][PATCH] Cogito support for simple subprojects Petr Baudis
  2 siblings, 1 reply; 56+ messages in thread
From: Johannes Schindelin @ 2006-01-11 16:44 UTC (permalink / raw)
  To: Simon Richter; +Cc: git

Hi,

On Wed, 11 Jan 2006, Simon Richter wrote:

> one thing that I have been missing so far in all SCM systems apart from CVS
> (and there it's just coincidence) is the ability to include a project as part
> of a bigger project. Developing software for embedded systems, I need that
> feature fairly often, for example the source tree for a particular device
> almost always contains one or more Linux trees, some binutils, gcc and gdb
> stuff and so on.

What I do: I call it a branch. While this might seem technically 
incorrect, it is not.

And since the subprojects are really independent, you can connect them by 
an octopus.

> The changes necessary here would be fairly simple: "tree" objects would point
> to a "commit" or a "tag" object when a subproject is used.

Sorry, we discussed similar things already. It is not necessary to change 
the structure. Even more: it makes no sense. Why would you want to have 
two or more commit messages for the same revision?

Remember: trees, commits and tags (objects in general) are immutable. You 
may think that you just commit a new revision of the subproject, and it is 
picked up by the overall project, but that is not the case!

> In the working directory, this would be represented by a .git directory that
> contains a symref to the embedding project instead of the objects directory.
> Head pointers are only required if you intend to push changes upstream to the
> maintainer of the embedded project. Each subproject has its own index.

You can do this like I said: use branches (and possibly a common 
GIT_OBJECT_DIRECTORY to save on disk space).

Hth,
Dscho

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 16:44 ` Johannes Schindelin
@ 2006-01-11 16:52   ` Simon Richter
  2006-01-11 17:42     ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Richter @ 2006-01-11 16:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

Hello,

Johannes Schindelin wrote:

> And since the subprojects are really independent, you can connect them by 
> an octopus.

The important thing for me is that I need to be able to transfer them 
easily, or turn a subdirectory into a subproject or vice versa.

> Sorry, we discussed similar things already. It is not necessary to change 
> the structure. Even more: it makes no sense. Why would you want to have 
> two or more commit messages for the same revision?

Because the commit affects both the subproject and the master project.

> Remember: trees, commits and tags (objects in general) are immutable. You 
> may think that you just commit a new revision of the subproject, and it is 
> picked up by the overall project, but that is not the case!

This is why I asked for intended behaviour on commit in a subproject. It 
is pretty obvious that the master project would need a new tree object 
to reference the new version of the subproject, and hence, a new commit 
to keep it all together (and correctly so, since I would like my master 
project to refer to that particular version of the subproject that is 
known to work).

> You can do this like I said: use branches (and possibly a common 
> GIT_OBJECT_DIRECTORY to save on disk space).

Yes, however that wouldn't cover consistency between the subprojects, 
would it?

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 16:52   ` Simon Richter
@ 2006-01-11 17:42     ` Linus Torvalds
  2006-01-11 19:43       ` Simon Richter
  2006-01-14  8:59       ` Junio C Hamano
  0 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2006-01-11 17:42 UTC (permalink / raw)
  To: Simon Richter; +Cc: Johannes Schindelin, git

On Wed, 11 Jan 2006, Simon Richter wrote:
> 
> The important thing for me is that I need to be able to transfer them easily,
> or turn a subdirectory into a subproject or vice versa.

Turning a _snapshot_ of a subproject into a subdirectory is easy: you can 
literally just create a subdirectory, copy it there, and it will re-use 
all the objects that the subproject uses (ie the top-level project will 
have a "tree" entry that just points to the same tree entry as the 
top-level commit in the sub-project).

However, while that works as a way to import snapshots, it doesn't work in 
any other way. It allows you to share objects with the "real project", and 
it's space-efficient etc, but there's no shared history, and you cannot 
merge back-and-forth, which is probably what you really want to do.

Quite frankly, you really probably want more of a "git-aware symlink" kind 
of thing. I'd really hesitate (in fact, I'd object) to re-use the existing 
"tree" type for it, but you're not the only one to have asked for 
subproject support, so this is clearly not a odd request.

> > Sorry, we discussed similar things already. It is not necessary to change
> > the structure. Even more: it makes no sense. Why would you want to have two
> > or more commit messages for the same revision?
> 
> Because the commit affects both the subproject and the master project.

What we _could_ do is for you to first do a commit in the "independent" 
subproject (it really would be a totally independent git repository in all 
ways: you could continue to merge it with other subprojects of the same 
type), and then you could commit a new pointer to that subproject in the 
master project. 

The two would really be fundamentally independent: they'd be two different 
git projects, one would just have a strange kind of "symlink" to the 
other, which would include a name and the top commit SHA1 of the other 
project.

Getting everything to work reasonably seamlessly would be potentially 
painful (getting "git diff" to recurse into the subdirectory correctly is 
non-trivial: you'd have a separate ".git/index" file for it), but it 
sounds doable.

I'd suggest adding a new kind of object ("gitlink") which has some 
well-specified format (20-byte SHA1 + ASCII C string "name" - the name 
translation to external repository would be done in the .git/config file 
of the "outer" project). Then a special file mode to indicate that in the 
"struct tree", and support for "git-update-cache" to understand how such 
an object is really tied into the "<pathname>/.git/HEAD" file rather than 
the rest of the directory contents.

Then a "git fetch" would have to be taught to recursively fetch the other 
subproject when the "gitlink" changes.

It should be doable: somebody could try to implement a rough first draft 
(maybe not very seamless at first).

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 17:42     ` Linus Torvalds
@ 2006-01-11 19:43       ` Simon Richter
  2006-01-11 20:06         ` Linus Torvalds
  2006-01-14  8:59       ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Simon Richter @ 2006-01-11 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, git

[-- Attachment #1: Type: text/plain, Size: 6042 bytes --]

Hi,

Linus Torvalds wrote:

> Turning a _snapshot_ of a subproject into a subdirectory is easy: you can 
> literally just create a subdirectory, copy it there, and it will re-use 
> all the objects that the subproject uses (ie the top-level project will 
> have a "tree" entry that just points to the same tree entry as the 
> top-level commit in the sub-project).

Exactly. My proposal is to allow the tree object to point to the 
toplevel commit object directly, thus importing the entire project[1].

> However, while that works as a way to import snapshots, it doesn't work in 
> any other way. It allows you to share objects with the "real project", and 
> it's space-efficient etc, but there's no shared history, and you cannot 
> merge back-and-forth, which is probably what you really want to do.

Well, the history cannot be really shared, as turning subtrees to 
projects and vice versa is a valid use case (and in fact something I do 
pretty often in my projects, as various "helper" classes evolve into 
"utility" libraries). Since the subproject needs to be self-contained, 
the history before it became a separate project will be difficult to 
represent, to say the least.

> Quite frankly, you really probably want more of a "git-aware symlink" kind 
> of thing. I'd really hesitate (in fact, I'd object) to re-use the existing 
> "tree" type for it, but you're not the only one to have asked for 
> subproject support, so this is clearly not a odd request.

[...]

> What we _could_ do is for you to first do a commit in the "independent" 
> subproject (it really would be a totally independent git repository in all 
> ways: you could continue to merge it with other subprojects of the same 
> type), and then you could commit a new pointer to that subproject in the 
> master project. 

Exactly. The questions I posed in the last paragraph of the initial 
mail, rewritten for clarity, would be
  - "should cg-commit automatically create a commit in the master 
project when a change in the subproject is committed?", and
  - "should cg-commit automatically commit all changes to subprojects 
when a path that has been listed on the command line contains a 
subproject?".

There are three cases, basically:

  - change to subproject, part of a larger set of changes, not ready for 
prime time: A commit in the subproject, master left alone (obviously, 
the directory would show as "modified".
  - change to subproject, fixing a bug that affects the master project. 
You'd expect these to happen often, as I could fix stuff that the master 
doesn't care about in another tree. In this case, you'd want a new 
commit to happen in the master as well, for everyone to enjoy.
  - change to subproject and master that need to go in sync, like 
renaming a configure option. Obviously, this can also happen after an 
update of the subproject, so creating a new commit on the master after 
an update is bad, but normal behaviour for update would be to merge and 
create a commit instantly if there were no conflicts.

> The two would really be fundamentally independent: they'd be two different 
> git projects, one would just have a strange kind of "symlink" to the 
> other, which would include a name and the top commit SHA1 of the other 
> project.

Well, that would be exactly what the tree contains. One could do an 
additional level of indirection with another object that just points to 
an sha1 (because its name is given by the tree referencing it).

Having such an object would mainly have the advantage of being extensible.

> I'd suggest adding a new kind of object ("gitlink") which has some 
> well-specified format (20-byte SHA1 + ASCII C string "name" - the name 
> translation to external repository would be done in the .git/config file 
> of the "outer" project).

Well, most people would likely not care about the external repository of 
the subproject, as they can get all the objects from the master project. 
I'm not even sure the subproject needs an "origin" branch by default, as 
you can push changes to the master project's maintainers who would then 
eventually push them on to the subproject's maintainers (in fact I 
believe it's vital to be able to keep changes to the subproject in the 
master project so development can go on while the subproject maintainers 
review the changes.

> Then a special file mode to indicate that in the 
> "struct tree", and support for "git-update-cache" to understand how such 
> an object is really tied into the "<pathname>/.git/HEAD" file rather than 
> the rest of the directory contents.

That sounds pretty simple actually.

> Then a "git fetch" would have to be taught to recursively fetch the other 
> subproject when the "gitlink" changes.

I think that would be mostly implicit if we were to use a direct 
tree->commit reference, but can be implemented trivially even for the 
link objects.

> It should be doable: somebody could try to implement a rough first draft 
> (maybe not very seamless at first).

Indeed. I just wanted to have rough use cases thrown at me before I even 
think of implementing something like it.

    Simon

[1] There is a slight problem with that approach: When you cherry-pick 
changes into a subproject (like I did today with the asm-arm/uaccess.h 
constness fix), the subproject will have a separate branch whose head is 
known to the outer project only. When the subproject gets merged with 
its origin later, that branch is no longer needed, and it makes a lot of 
sense to have the master project reference origin again. This means that 
if you look at the history of the inner project from the POV of the 
outer, the new commit is no longer a descendant of the old, but in fact 
it may be a good idea to attempt a fast-forward merge nevertheless as 
going through the common ancestor is very likely to cause conflicts 
here, and those conflicts have already been resolved, or you wouldn't be 
seeing an updated subproject link (i.e. you just want to preserve local 
changes and stuff committed on top of the old reference).

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 19:43       ` Simon Richter
@ 2006-01-11 20:06         ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2006-01-11 20:06 UTC (permalink / raw)
  To: Simon Richter; +Cc: Johannes Schindelin, git

On Wed, 11 Jan 2006, Simon Richter wrote:
> 
> Exactly. The questions I posed in the last paragraph of the initial mail,
> rewritten for clarity, would be
>  - "should cg-commit automatically create a commit in the master project when
> a change in the subproject is committed?", and

No.

Don't commit in the "parent" thing automatically. The sub-project should 
be as independent as possible, and you should see a commit to that to be 
nothing more than "editing" a regular file in the top-level project.

In many ways, if you decide to look into doing a "gitlink" kind of object, 
that object really _does_ conceptually point to the .git/HEAD file in the 
subproject. So when you do a commit in the subproject, conceptually that 
is no different from editing the .git/HEAD.

Then, when you want to commit the _dependency_ of the top-level project on 
the sub-project, you commit in the top level. That commit probably does 
other things too: it probably also commits the code in the top level that 
now depends on the sub-project changes.

So don't tie the two together any more than necessary. My suggested usage 
case has the big advantage that the sub-project is much less tightly 
coupled, so you can do things like "git pull" _inside_ the subproject, to 
update it, and then do a big compile in the top-level project to reflect 
the changes (and perhaps update stuff at the top level to conform to 
changes in the sub-project), and then commit in the top independently 
(which will now automatically pick up the changes to .git/HEAD that "git 
pull" did on the subproject).

>  - "should cg-commit automatically commit all changes to subprojects when a
> path that has been listed on the command line contains a subproject?".

Again, I'd really suggest not. If you keep the "gitlink" really meaning 
the ".git/HEAD" contents of the subproject (which is a good semantic 
rule), then if there is any dirty state in the sub-project, it is totally 
irrelevant to a "git commit". Because it's dirty, it's not part of of 
.git/HEAD yet.

Now, obviously you should make "git status" _talk_ about the fact that the 
sub-project is dirty, so that the committer sees that the sub-project 
needs committing first (the same way "git status" now informs git commit 
about dirty files that haven't been updated).

(I'm saying "git" here all the time rather than cg-, because not only 
don't I know cogito very well, I think you want to do most of the core at 
the git level, and just teach cg about the new capability).

			Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 15:58 RFC: Subprojects Simon Richter
  2006-01-11 16:44 ` Johannes Schindelin
@ 2006-01-12  3:19 ` Alexander Litvinov
  2006-01-12  4:46   ` Martin Langhoff
  2006-01-15 15:07 ` [RFC][PATCH] Cogito support for simple subprojects Petr Baudis
  2 siblings, 1 reply; 56+ messages in thread
From: Alexander Litvinov @ 2006-01-12  3:19 UTC (permalink / raw)
  To: Simon Richter; +Cc: git

On Wednesday 11 January 2006 21:58, Simon Richter wrote:
> Hello,
>
> one thing that I have been missing so far in all SCM systems apart from
> CVS (and there it's just coincidence) is the ability to include a
> project as part of a bigger project. 

I really miss this feature. This is the last stopper for moving from CVS to 
git for out project.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  3:19 ` Alexander Litvinov
@ 2006-01-12  4:46   ` Martin Langhoff
  2006-01-12  5:25     ` Alexander Litvinov
  2006-01-12 13:38     ` Daniel Barkalow
  0 siblings, 2 replies; 56+ messages in thread
From: Martin Langhoff @ 2006-01-12  4:46 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Simon Richter, git

On 1/12/06, Alexander Litvinov <lan@ac-sw.com> wrote:
> On Wednesday 11 January 2006 21:58, Simon Richter wrote:
> > Hello,
> >
> > one thing that I have been missing so far in all SCM systems apart from
> > CVS (and there it's just coincidence) is the ability to include a
> > project as part of a bigger project.
>
> I really miss this feature. This is the last stopper for moving from CVS to
> git for out project.

What about using nested checkouts? They work great with git as-is,
just add an .gitignore file.

As Linus points out, there are many good reasons why a top-level
commit should _not_ commit the nested subproject. And once you are
observing that rule, what's left then? git status and git diff <HEAD>
can show an aggregate of top-level and nested subprojects, but that's
ease-of-use -- not something only.

What is your show stopper?

cheers,

martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  4:46   ` Martin Langhoff
@ 2006-01-12  5:25     ` Alexander Litvinov
  2006-01-12  5:39       ` Martin Langhoff
  2006-01-12  7:20       ` Anand Kumria
  2006-01-12 13:38     ` Daniel Barkalow
  1 sibling, 2 replies; 56+ messages in thread
From: Alexander Litvinov @ 2006-01-12  5:25 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Simon Richter, git

On Thursday 12 January 2006 10:46, Martin Langhoff wrote:
> > I really miss this feature. This is the last stopper for moving from CVS
> > to git for out project.
>
> What about using nested checkouts? They work great with git as-is,
> just add an .gitignore file.
>
> As Linus points out, there are many good reasons why a top-level
> commit should _not_ commit the nested subproject. And once you are
> observing that rule, what's left then? git status and git diff <HEAD>
> can show an aggregate of top-level and nested subprojects, but that's
> ease-of-use -- not something only.
>
> What is your show stopper?

I would agree to make separate commits for each sub project.

1. I need to have ability to make tags, branches thru all subprojects.
2. Update (pull) sould update each subproject, it is hard to update them by 
hands.
3. The need of some sort of checkout script (can be solved by storing this 
script in base project, but it would be much nicer allow git fetch all 
subprojects)

Nothing else I can imagine.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  5:25     ` Alexander Litvinov
@ 2006-01-12  5:39       ` Martin Langhoff
  2006-01-12  8:36         ` Alexander Litvinov
  2006-01-12  7:20       ` Anand Kumria
  1 sibling, 1 reply; 56+ messages in thread
From: Martin Langhoff @ 2006-01-12  5:39 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Simon Richter, git

On 1/12/06, Alexander Litvinov <lan@ac-sw.com> wrote:
> > What is your show stopper?
>
> I would agree to make separate commits for each sub project.
>
> 1. I need to have ability to make tags, branches thru all subprojects.

I suspect that this is a bad idea -- for the same reason as committing
to a subproject is a bad idea. The subprojects most likely have their
own external repositories -- and lifecycles of their own. The same
headname/branchname won't do.

> 2. Update (pull) sould update each subproject, it is hard to update them by
> hands.

A simple shellscript can help you here.

> 3. The need of some sort of checkout script (can be solved by storing this
> script in base project, but it would be much nicer allow git fetch all
> subprojects)

As you say, a bootstrapping shellscript can sort this out.

Sounds quite doable ;-)

(have to warn you though -- git is quite addictive. there's no going back...)

cheers,

martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  5:25     ` Alexander Litvinov
  2006-01-12  5:39       ` Martin Langhoff
@ 2006-01-12  7:20       ` Anand Kumria
  1 sibling, 0 replies; 56+ messages in thread
From: Anand Kumria @ 2006-01-12  7:20 UTC (permalink / raw)
  To: git

On Thu, 12 Jan 2006 11:25:33 +0600, Alexander Litvinov wrote:

> On Thursday 12 January 2006 10:46, Martin Langhoff wrote:
>> > I really miss this feature. This is the last stopper for moving from CVS
>> > to git for out project.
>>
>> What about using nested checkouts? They work great with git as-is,
>> just add an .gitignore file.
>>
>> As Linus points out, there are many good reasons why a top-level
>> commit should _not_ commit the nested subproject. And once you are
>> observing that rule, what's left then? git status and git diff <HEAD>
>> can show an aggregate of top-level and nested subprojects, but that's
>> ease-of-use -- not something only.
>>
>> What is your show stopper?
> 
> I would agree to make separate commits for each sub project.
> 
> 1. I need to have ability to make tags, branches thru all subprojects.
> 2. Update (pull) sould update each subproject, it is hard to update them by 
> hands.
> 3. The need of some sort of checkout script (can be solved by storing this 
> script in base project, but it would be much nicer allow git fetch all 
> subprojects)
> 
> Nothing else I can imagine.

It sounds like you want 'config-manager',
http://packages.debian.org/unstable/devel/config-manager, it doesn't
support git (yet) but I can't imagine it is hard to add that support in.

Cheers,
Anand

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  5:39       ` Martin Langhoff
@ 2006-01-12  8:36         ` Alexander Litvinov
  2006-01-12  8:58           ` Alex Riesen
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Litvinov @ 2006-01-12  8:36 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Simon Richter, git

> > 1. I need to have ability to make tags, branches thru all subprojects.
>
> I suspect that this is a bad idea -- for the same reason as committing
> to a subproject is a bad idea. The subprojects most likely have their
> own external repositories -- and lifecycles of their own. The same
> headname/branchname won't do.

This is one main idea of supporing subprojects. Everything else I already can 
do. I want to be able to make tag over composite project and be able to fetch 
tagged files later. The same with branches.

I cleary understand if I made tag/branch on subproject outside my composite 
project I will not be able to work with it - this is ok.

But tag/branches on whole composite project is "the must".

I hope it is possible to teach git (or may be something else) to scan all 
subprojects and fetch common tags/branches and work with them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  8:36         ` Alexander Litvinov
@ 2006-01-12  8:58           ` Alex Riesen
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Riesen @ 2006-01-12  8:58 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Martin Langhoff, Simon Richter, git

On 1/12/06, Alexander Litvinov <lan@ac-sw.com> wrote:
> > > 1. I need to have ability to make tags, branches thru all subprojects.
> >
> > I suspect that this is a bad idea -- for the same reason as committing
> > to a subproject is a bad idea. The subprojects most likely have their
> > own external repositories -- and lifecycles of their own. The same
> > headname/branchname won't do.
>
> This is one main idea of supporing subprojects. Everything else I already can
> do. I want to be able to make tag over composite project and be able to fetch
> tagged files later. The same with branches.
>
> I cleary understand if I made tag/branch on subproject outside my composite
> project I will not be able to work with it - this is ok.
>
> But tag/branches on whole composite project is "the must".

The Linus' proposal of gitlink will probably help you here: gitlink will be
tagged as well, so you just have to teach git-checkout about checking
out subprojects.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-12  4:46   ` Martin Langhoff
  2006-01-12  5:25     ` Alexander Litvinov
@ 2006-01-12 13:38     ` Daniel Barkalow
  1 sibling, 0 replies; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-12 13:38 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Alexander Litvinov, Simon Richter, git

On Thu, 12 Jan 2006, Martin Langhoff wrote:

> What about using nested checkouts? They work great with git as-is,
> just add an .gitignore file.
> 
> As Linus points out, there are many good reasons why a top-level
> commit should _not_ commit the nested subproject. And once you are
> observing that rule, what's left then? git status and git diff <HEAD>
> can show an aggregate of top-level and nested subprojects, but that's
> ease-of-use -- not something only.
> 
> What is your show stopper?

The core structural thing (which I'm not sure CVS handles) is having each 
commit of the outer project specify the commit of the inner project that 
it contains in some way. This would be good with CVS, but is vital with 
git, because there's no way of estimating it when you don't have a linear 
history. (With CVS, you could say that the inner project version for a 
given outer project version should be the version that was current when 
the outer project was committed. But that isn't well-defined for git.) If 
you try to debug anything involving the history, you'd have problems with 
choosing versions of the two projects that don't actually match.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-11 17:42     ` Linus Torvalds
  2006-01-11 19:43       ` Simon Richter
@ 2006-01-14  8:59       ` Junio C Hamano
  2006-01-14 19:16         ` Linus Torvalds
                           ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-14  8:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, git, Simon Richter

Linus Torvalds <torvalds@osdl.org> writes:

> I'd suggest adding a new kind of object ("gitlink") which has some 
> well-specified format (20-byte SHA1 + ASCII C string "name" - the name 
> translation to external repository would be done in the .git/config file 
> of the "outer" project). Then a special file mode to indicate that in the 
> "struct tree", and support for "git-update-cache" to understand how such 
> an object is really tied into the "<pathname>/.git/HEAD" file rather than 
> the rest of the directory contents.
>
> Then a "git fetch" would have to be taught to recursively fetch the other 
> subproject when the "gitlink" changes.

There are two positive properties about this setup, and one
negative:

 + The contained project is kept totally independent and does
   not have to know it is contained.

 + The tree for the contained project can be rooted anywhere in
   the containing project's tree.

 - The contained project cannot be rooted at the same level or
   higher than the containing project; the containing project
   can only delegate a whole subdirectory to the contained
   project.

The "embedded software" example Simon originally suggested can
be represented with the above.  I'll think aloud for a while
here, because I am of a slow kind who needs a more-or-less
concrete illustration to understand what is being discussed
(that is primarily why I have not said anything on this topic so
far).

The "containing" project would have a handful "gitlink" objects
among other things.  The toplevel tree object from a commit in
such a project might look like this (mode bits 0160000 is
S_IFDIR|S_IFLNK, which is what this thing is):

	$ git ls-tree HEAD
        0100644 blob 012345... Makefile
        0100644 blob 123456... README
        0160000 link 234567... gcc-4.0
        0160000 link 345678... linux-2.6
	0040000 tree 456789... src
	$ git cat-file -t 345678
        link
        $ git cat-file link 345678
        commit 87530db5ec7d519c7ba334e414307c5130ae2da8
	url git://...torvalds/linux-2.6.git/

        The upstream Linux 2.6 repository.
	$ cd linux-2.6 && git-rev-parse --verify HEAD
        87530db5ec7d519c7ba334e414307c5130ae2da8

URL will be used as a suggestion for people who cloned this tree
to set up their repository.  The place and method you clone from
Linus tree might be different, so this has to stay suggestion
and should be overridable by the repository owner.  And to help
people at an unusual location you could have textual comment at
the end, just like tags.

How would this get set up initially?  Here is one way.

        $ git init-db
        $ edit Makefile README src/*
	$ git clone git://...torvalds/linux-2.6.git/ linux-2.6
        $ git clone git://.../gcc-4.0.git/ gcc-4.0
        $ link=$(echo 'The upstream Linux 2.6 repository.' |
                 git-mklink linux-2.6)
	$ git update-index --add --cacheinfo 0160000 $link linux-2.6
        $ : ;# same for gcc-4.0
        $ git add . ;# add the rest as usual
	$ git commit

I presume that the index file have the "gitlink" object just
like in a tree object.  The usual merge rules would apply to
those index entries; we should be able to treat gitlinks just
like we handle symlinks.

Interesting would be "git checkout-index linux-2.6" (or what
"git read-tree -u" does in this "containing" project for
linux-2.6 subdirectory).  After descending into linux-2.6, it
should not just do "git reset --hard $commit" for the commit
recorded in the gitlink (the user may have local modifications
in the subtree).  Doing "git update-ref HEAD $commit" there is
not quite right either because the index there would then need
to be adjusted as well.  Perhaps the real core level commands
such as "checkout-index" and "read-tree -u" should fail when the
subproject tree is dirty, just like "read-tree -m old new" does
not always have to succeed.

What does "git-diff-index/git-diff-tree/git-diff-files" would do
with them?

	$ git-diff-files linux-2.6

would compare the commit recorded in the link and what is
checked out in the linux-2.6/.git/HEAD and report that
difference.  So do other git-diff-* siblings.  At the core level
we do not have to recurse and look at linux-2.6/.git/index (we
may end up doing so at the end, I dunno; initially we said at
the core level we do not have to generate patches but we ended
up having -p option go all of git-diff-* siblings).

Fetching/cloning at the core level is easy.  "git-fetch-pack"
would just need to do one level, but Porcelains need to address
how to actually arrange the subprojects cloning to happen, which
is harder.

"git clone" would say: "Ah, now I see these gitlinks; we need to
clone them.  linux-2.6 directory needs to be populated with
commit 87530d from git://...torvalds/linux-2.6.git/ repository.
Would this work for you, or would you use different mirror?"
and then it clones the repository and sets linux-2.6/.git/HEAD
to the named commit and does a checkout.  The URL used for this
actual subcloning would need to be stored somewhere in $GIT_DIR/,
perhaps in config as you suggested.  I do not think we need a
separate name for it -- we can probably say "linux-2.6" for this
(i.e. use the pathname itself as the key).

What happens if the containing project wants to move these
gitlinks (or remove them)?  When checking out such a commit with
"git-read-tree -u", would the subproject directory be wiped out
(again, such a "read-tree" would be prevented if it would result
in information loss)?

All of this sounds quite a lot of change with brittleness.

Now I'll think aloud about a completely different design.

We could simply overlay the projects.  I think this is what
Johannes suggested earlier.

You keep one branch for each "subproject", and make commits into
each branch (i.e. if you modified files for the upstream kernel,
the change is committed to the branch for linux-2.6 subproject),
but when checking things out, you do an equivalent of octopus
merge across subprojects.

One downside of this approach is we cannot re-root the
subprojects until we update read-tree and write-tree, but I
suspect that would be a lot smaller change.  Once that is done,
we could:

 $ git init-db
 $ mkdir linux-2.6
 $ H=$(git-fetch-pack -k git://...torvalds/linux-2.6.git/ master)
 $ echo $H >.git/refs/heads/kernel
 $ : ;# same for gcc-4.0
 $ cat .git/config <<EOF
 [core]
 	branchroot = linux-2.6 for kernel
 	branchroot = gcc-4.0 for gcc
 EOF
 $ git add . ;# add src and stuff
 $ git commit ;# commits only the scaffolding into "master"

So far, we fetched the kernel and gcc HEAD with needed objects
and stored them into separate branches.  Then:

 $ git setup-overlay embed master kernel gcc ;# works like an octopus

The setup-overlay command would create a new branch "embed" to
hold an octopus merge across named branches "master", "kernel",
and "gcc", and mark that the repository is in a funny "overlay"
mode, in which various commands work differently from usual:

 $ edit linux-2.6/CREDITS gcc-4.0/COPYING Makefile
 $ git commit -a

The "commit" needs to be taught to look at what setup-overlay
left for us, pick out paths that belong to each constituent
branch and do a re-rooting write-tree, for each branch.

This would keep changes to subprojects independent painlessly,
but we would also need a way to tie the versions of subprojects
together (i.e. "this version of src was done with this
particular version of linux-2.6").  This can be done by
committing the octopus to "embed" branch.  Probably easiest
would be to make one commit each to modified constituent branch,
and after that make another commit to "embed" to commit the
octopus to keep track of the aggregation --- the commit would
have the parents set to the previous embed and top commit of
each constituent branch.

If we do not need re-rooting (e.g. redo your slurping gitk into
git.git), I think all of the above can be done without any core
changes.  It would be a lot of Porcelainish work, but I suspect
the core impact would be smaller.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14  8:59       ` Junio C Hamano
@ 2006-01-14 19:16         ` Linus Torvalds
  2006-01-14 19:32           ` A Large Angry SCM
  2006-01-14 20:16           ` Junio C Hamano
  2006-01-16  7:28         ` Alexander Litvinov
  2006-02-20 13:16         ` Uwe Zeisberger
  2 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2006-01-14 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Simon Richter

On Sat, 14 Jan 2006, Junio C Hamano wrote:

>  + The contained project is kept totally independent and does
>    not have to know it is contained.
> 
>  + The tree for the contained project can be rooted anywhere in
>    the containing project's tree.

Right.

>  - The contained project cannot be rooted at the same level or
>    higher than the containing project; the containing project
>    can only delegate a whole subdirectory to the contained
>    project.

Yes.

However, I think this is actually a _huge_ advantage.

The thing is, if you do the contained projects as "union projects" as you 
suggest, I will bet that it will really really suck, because it ends up 
losing the two positives above.

In particular, any real independent project will have it's own "Makefile" 
or "configure-in", and often its own "src" subdirectory or other 
pseudo-standard names.

And the "contained project as a link" approach has zero problems with that 
at all, exactly because it keeps the projects clearly separate - just 
linked (one way).

> What does "git-diff-index/git-diff-tree/git-diff-files" would do
> with them?

I would actually argue that git itself wouldn't do a whole lot with them. 
There are real advantages to seeing only the diffs wrt _one_ of the 
projects, and I'd argue that

	git-diff-*

would actually act like they now act for directories that they don't 
recurse into, ie you'd see something like

	:160000 160000 5eb57670... 3f1a42aa... M	sub-project

and it would be up to higher-level porcelain to recurse.

Why? Partly because that's actually likely enough for a lot of users: you 
_can_ use just the raw git programs by just doing

	cd sub-project
	git diff
	..
	git commit

and so technically you aren't really missing a lot. The capabilities are 
there, you just have to do some more by hand (but in many ways that is 
_good_: it makes it obvious that you're really committing a _different_ 
subproject).

The other reason? A lot of the git infrastructure really does only work on 
the "one project" level. The programs work with _one_ index, not two. 
Reading two trees is perfectly possible, but unless you keep them in 
separate stages, you can't separate them afterwards. IOW, trying to be 
recursive really does end up being a big change, for very little gain (and 
for a lot of potential bugs and instability).

In contrast, doing it at a higher level means that you have a simple and 
reliable lower level that you can trust. Layering is good.

> Fetching/cloning at the core level is easy.  "git-fetch-pack"
> would just need to do one level, but Porcelains need to address
> how to actually arrange the subprojects cloning to happen, which
> is harder.
> 
> "git clone" would say: "Ah, now I see these gitlinks; we need to
> clone them.

Actually, I would say no - that's actually not a "clone" operation so much 
as a "checkout" operation. There are strong arguments that you should 
_not_ clone sub-projects when you clone the top-level project: there's no 
reason to. Anybody else who clones it will have all the information you 
have, so cloning th esub-project is just extra work.

So only if you actually check it out (which is often in practice the 
second stage of the cloning, of course) do you want to fetch the 
subproject too. But even then you might want to ask the user (he may have 
a local repository for that sub-project somewhere else, so going to the 
"canonical name" might be the wrong thing to do - and he might not even 
care, because he might want to work _just_ on the top-level project).

> Now I'll think aloud about a completely different design.
> 
> We could simply overlay the projects.  I think this is what
> Johannes suggested earlier.
> 
> You keep one branch for each "subproject", and make commits into
> each branch (i.e. if you modified files for the upstream kernel,
> the change is committed to the branch for linux-2.6 subproject),
> but when checking things out, you do an equivalent of octopus
> merge across subprojects.

I think this one has serious disadvantages:

 - it's much less obvious when there are common names and especially 
   common subdirectories.
 - in _practice_, almost all sub-projects are kept in sub-directories. Are 
   you doing to change the sub-project git tree? How are you going to 
   merge back to the original sub-project?
 - iow, I think this only works for sub-projects that are totally 
   controlled by the top-level project - in which case they might as well 
   just be totally merged into the top level (the way we did with the 
   "tools" project, and largely with "gitk").

in the "gitk" case, we could actually continue to keep gitk a separate 
project, but that was really fortunate: it's purely because gitk ends up 
being a single file, with no Makefile at all to build it independently 
etc. The moment we integrated the "tools" sub-project into git, we lost 
the ability to do that, exactly because they now needed to share Makefiles 
etc, making all further development very inter-twined.

Put another way: the moment you have linkages going both ways between the 
subproject and the top-level project, it's no longer two separate 
projects. At that point, it in practice becomes one, since the sub-project 
can no longer do independent development without merging becoming a big 
issue.

The advantage of having a "git link" is exactly the fact that the 
dependency goes only one way. The subproject remains truly independent.

			Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 19:16         ` Linus Torvalds
@ 2006-01-14 19:32           ` A Large Angry SCM
  2006-01-14 20:02             ` Linus Torvalds
  2006-01-14 20:16           ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: A Large Angry SCM @ 2006-01-14 19:32 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Junio C Hamano, Johannes Schindelin,
	Simon Richter

So far I've not seen any convincing arguments why the sub-projects can 
not be managed by the Makefile, or equivalent, of the super-project. 
Particularly when the sub-projects have a life of their own.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 19:32           ` A Large Angry SCM
@ 2006-01-14 20:02             ` Linus Torvalds
  2006-01-14 20:30               ` A Large Angry SCM
  2006-01-16  7:48               ` Alex Riesen
  0 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2006-01-14 20:02 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git, Junio C Hamano, Johannes Schindelin, Simon Richter

On Sat, 14 Jan 2006, A Large Angry SCM wrote:
>
> So far I've not seen any convincing arguments why the sub-projects can not be
> managed by the Makefile, or equivalent, of the super-project. Particularly
> when the sub-projects have a life of their own.

Now, from a developer standpoint I actually agree with you. I find 
sub-projects totally useless - I'm much happier just having separate 
trees.

The advantage (as far as I can tell) of sub-projects is not that they are 
easier to develop in, but that it's a total nightmare for the technical 
_user_ to download ten different projects from ten different sites, and 
configure them properly and install them in the right order, and keep them 
up-to-date.

There are projects that I simply gave up even trying to track: I wasn't 
interested in being a developer per se, but I _was_ interested in trying 
to test and give feedback to the current development tree - but it was 
just too damn confusing to get it working.

If I could have just done a "git clone <top-level>" to get it all, I'd 
have been a much more productive user.

This is why I think sub-projects are more about "git checkout" and an 
automated "git fetch" than anything else. Doing actual development etc you 
can easily do one project at a time. "git diff" and "git commit" wouldn't 
need any real ability to recurse into subprojects and try to make it 
seamless. And if you do a "git pull" that needs to do anything but 
fast-forward, you might as well resolve the sub-projects one by one.

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 19:16         ` Linus Torvalds
  2006-01-14 19:32           ` A Large Angry SCM
@ 2006-01-14 20:16           ` Junio C Hamano
  2006-01-15  1:01             ` Junio C Hamano
  2006-01-16 10:44             ` Josef Weidendorfer
  1 sibling, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-14 20:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sat, 14 Jan 2006, Junio C Hamano wrote:
>
> The thing is, if you do the contained projects as "union projects" as you 
> suggest, I will bet that it will really really suck, because it ends up 
> losing the two positives above.

After a good night's sleep, I agree.  I have not thought things
through and still have a feeling that (feasibilities aside) it
would be interesting if we can do a "union projects" a la "union
mounts" (or translucent filesystem).  But that "interesting"
thing would probably not be very useful in practice.

> would actually act like they now act for directories that they don't 
> recurse into, ie you'd see something like
>
> 	:160000 160000 5eb57670... 3f1a42aa... M	sub-project
>
> and it would be up to higher-level porcelain to recurse.

This I agree with.

> The other reason? A lot of the git infrastructure really does only work on 
> the "one project" level. The programs work with _one_ index, not two. 
> Reading two trees is perfectly possible, but unless you keep them in 
> separate stages, you can't separate them afterwards. IOW, trying to be 
> recursive really does end up being a big change, for very little gain (and 
> for a lot of potential bugs and instability).

Yup.  BTW, I think with a couple of minor tweaking and giving it
the same restriction ("two pluses and one negative") as the
gitlink proposal, the "union" approach would work equally well,
perhaps with a simpler implementation. I'll think aloud about
this at the end.

>> Fetching/cloning at the core level is easy.  "git-fetch-pack"
>> would just need to do one level, but Porcelains need to address
>> how to actually arrange the subprojects cloning to happen, which
>> is harder.
>...
> So only if you actually check it out (which is often in practice the 
> second stage of the cloning, of course) do you want to fetch the 
> subproject too.

We are in complete agreement here.

> I think this one has serious disadvantages:
>
>  - it's much less obvious when there are common names and especially 
>    common subdirectories.
>  - in _practice_, almost all sub-projects are kept in sub-directories. Are 
>    you doing to change the sub-project git tree? How are you going to 
>    merge back to the original sub-project?
>  - iow, I think this only works for sub-projects that are totally 
>    controlled by the top-level project - in which case they might as well 
>    just be totally merged into the top level (the way we did with the 
>    "tools" project, and largely with "gitk").

Yes, I agree to the above 100%; the serious disadvantages come
from the fact that we do not have clear separation between
subprojects -- which new files belong to what subproject.  I
think re-rooting read-tree and write-tree would help solving
that.  After I wrote the message you are replying to, I came up
with a couple of tweaks.

 - Do the octopus-like thing, but always give subprojects a
   separate directories to work in.

 - Extend "commit" objects for the toplevel project to record
   what subprojects with what head commits are contained at
   which subdirectory.  I wrote in the previous message to make
   subprojects heads parents of aggregate commits, but I think
   that one without "where to" information has a serious
   disadvantage when computing a merge.

In the "embedded linux" example that has "linux-2.6" and
"gcc-4.0" projects as an externally controlled subprojects, and
has all the rest (including the toplevel Makefile) in "master"
branch:

     $ tar xf embed.tar embed && cd embed && git init-db
     $ git add . ;# toplevel Makefile and stuff
     $ git commit -a -m 'embedded repo - initial'

After doing "git-fetch-pack -k git://.../linux-2.6.git/ master"
and "echo $H >.git/refs/heads/kernel" (similar for gcc-4.0) to
set up the branch heads (but we do not have any working tree
files for these subprojects yet):

	$ git bind -m 'Bind kernel and gcc into us' \
        	kernel=linux-2.6 gcc=gcc-4.0

would prepare the subprojects binding (I am just looking for a
better word --- I called it "setup-overlay" in the previous
message).  This would:

 - append the tree object in "kernel" commit object to the
   current index, rerooted at linux-2.6/; similar for "gcc" at
   gcc-4.0/. We may need a new mode and option for read-tree for
   this, or we may not.  Internally this step would be scripted
   in "git bind" wrapper like this:

	git read-tree --bind --prefix=linux-2.6 kernel
	git read-tree --bind --prefix=gcc-4.0 gcc

   and would result in an index file that has these trees
   "mounted" at specified places.  If you look at only the index
   file, you cannot tell this is an overlay, unlike gitlink
   scheme.

 - make a commit that records the tree object (the whole thing
   including the subproject trees), with the initial commit we
   made earlier as the sole parent commit, and additionally
   records the two subproject heads with bind points.  This
   happens in the same "git bind" wrapper, and produces
   something like:

	$ git cat-file commit HEAD
        tree e9de76f2e141824439caa00a65e3b91d05d125c9
        parent bfca932434cc65e7aa90794e7c4d66f75d00b16a
        bind a8fe7257b8427d31cfcca0aa336335bb43689fc9 linux-2.6
        bind b3b2df23226634f42c9646bd7961fbea8b00f914 gcc-4.0
        author Junio C Hamano <junkio@cox.net> 1137205528 -0800
        committer Junio C Hamano <junkio@cox.net> 1137205528 -0800

	Bind kernel and gcc into us.

   "bind" line needs to be taught to fsck-objects.  The format
   is the object name of the commit followed by (c-style quoted)
   subdirectory name.

 - record the branch name vs subproject directory binding in
   $GIT_DIR/ somewhere, say $GIT_DIR/mtab ;-).

	$ cat .git/mtab
	kernel	linux-2.6
        gcc	gcc-4.0

After this, "git checkout-index -f -q -u -a" would populate the
whole thing.  Instead of linux-2.6/.git/HEAD as in gitlink
example, I am using .git/refs/heads/kernel; this would not make
a semantic difference.  One big difference however is I have
only one index file that controls the whole tree, without using
a separate linux-2.6/.git/index.

After mucking with a file in linux-2.6/ subdirectory and nowhere
else, committing the result from the whole tree would work like
this:

 - Look at the current commit and notice the bind for two
   subdirectories; then look them up in $GIT_DIR/mtab to see
   which branches keep track of them.

 - Notice that there are modified paths in the index vs tree
   from the last commit under linux-2.6/ directory.

 - Write out only that part, re-rooted, into a tree.

	git write-tree --prefix=linux-2.6

 - Make a commit to record that tree, with a parent set to the
   "kernel" branch head; update the "kernel" branch head at that
   commit.

 - Make another commit to record the tree made from the whole
   index (obviously linux-2.6 subdirectory would result in the
   same tree object we just committed in the subproject) with
   parent set to .git/HEAD and bind adjusted accordingly; update
   the "HEAD".

Now I have to think about clones and merges but this is getting
too long so I'll leave it to a separate message.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 20:02             ` Linus Torvalds
@ 2006-01-14 20:30               ` A Large Angry SCM
  2006-01-14 20:38                 ` Junio C Hamano
  2006-01-16  7:48               ` Alex Riesen
  1 sibling, 1 reply; 56+ messages in thread
From: A Large Angry SCM @ 2006-01-14 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Junio C Hamano, Johannes Schindelin, Simon Richter

Linus Torvalds wrote:
> 
> On Sat, 14 Jan 2006, A Large Angry SCM wrote:
>>So far I've not seen any convincing arguments why the sub-projects can not be
>>managed by the Makefile, or equivalent, of the super-project. Particularly
>>when the sub-projects have a life of their own.
> 
> Now, from a developer standpoint I actually agree with you. I find 
> sub-projects totally useless - I'm much happier just having separate 
> trees.
> 
> The advantage (as far as I can tell) of sub-projects is not that they are 
> easier to develop in, but that it's a total nightmare for the technical 
> _user_ to download ten different projects from ten different sites, and 
> configure them properly and install them in the right order, and keep them 
> up-to-date.
> 
> There are projects that I simply gave up even trying to track: I wasn't 
> interested in being a developer per se, but I _was_ interested in trying 
> to test and give feedback to the current development tree - but it was 
> just too damn confusing to get it working.
> 
> If I could have just done a "git clone <top-level>" to get it all, I'd 
> have been a much more productive user.

$ make get_sub_components

This can work with most any SCM (depending on your environment), is 
amazingly flexible, and does not require special support in the SCM.

The "get" rule for each sub-project could be something like:

	git_sub-project:
		mkdir sub-project
		cd sub-project
		git-init-db
		git-fetch <fetch-options> <repository> <refspec>
		git-checkout <branch>
		$(MAKE) get_sub_components

> 
> This is why I think sub-projects are more about "git checkout" and an 
> automated "git fetch" than anything else. Doing actual development etc you 
> can easily do one project at a time. "git diff" and "git commit" wouldn't 
> need any real ability to recurse into subprojects and try to make it 
> seamless. And if you do a "git pull" that needs to do anything but 
> fast-forward, you might as well resolve the sub-projects one by one.

And all of this can be done today, without changing git, with more 
flexibility, with Make rules.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 20:30               ` A Large Angry SCM
@ 2006-01-14 20:38                 ` Junio C Hamano
  2006-01-15  0:28                   ` Martin Langhoff
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2006-01-14 20:38 UTC (permalink / raw)
  To: gitzilla; +Cc: git, Junio C Hamano, Johannes Schindelin, Simon Richter

A Large Angry SCM <gitzilla@gmail.com> writes:

>> If I could have just done a "git clone <top-level>" to get it all,
>> I'd have been a much more productive user.
>
> $ make get_sub_components
>
> This can work with most any SCM (depending on your environment), is
> amazingly flexible, and does not require special support in the SCM.

I am with you two on this one, in principle, as a developer.

> The "get" rule for each sub-project could be something like:
>
> 	git_sub-project:
> 		mkdir sub-project
> 		cd sub-project
> 		git-init-db
> 		git-fetch <fetch-options> <repository> <refspec>
> 		git-checkout <branch>
> 		$(MAKE) get_sub_components

There lies a drake here --- <repository> is not the same for
everybody.  It is not a big showstopper dragon, though.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 20:38                 ` Junio C Hamano
@ 2006-01-15  0:28                   ` Martin Langhoff
  2006-01-15  0:49                     ` Junio C Hamano
  2006-01-16  5:06                     ` Daniel Barkalow
  0 siblings, 2 replies; 56+ messages in thread
From: Martin Langhoff @ 2006-01-15  0:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: gitzilla, git, Johannes Schindelin, Simon Richter

On 1/15/06, Junio C Hamano <junkio@cox.net> wrote:
> > The "get" rule for each sub-project could be something like:
> >
> >       git_sub-project:
> >               mkdir sub-project
> >               cd sub-project
> >               git-init-db
> >               git-fetch <fetch-options> <repository> <refspec>
> >               git-checkout <branch>
> >               $(MAKE) get_sub_components
>
> There lies a drake here --- <repository> is not the same for
> everybody.  It is not a big showstopper dragon, though.

Well, that /little complication/ applies to doing it in git too ;-)
There's no way to tell how the dev doing the top level checkout has
access to the subproject repos.

I am with gitzilla on this one. Let the projects have their own
bootstraping mechanisms, using make, ant or whatever catches their
fancy. One of the great things about git is that it doesn't assume
that it's being used by all the projects in the world -- thanks to
Linus' disregard for arbitrary metadata and to your git-cherry
implementation, it's all about the content -- and so it interoperates
great with Arch, SVN, CVS, etc.

Having intra-git subproject support assumes that the subprojects are
all in git. Heh! That covers  about 0.001% of reality out there.
Per-project bootstraping scripts will use whatever tools they need for
the checkout.

Automating the 'checkout' stage for git subprojects is trivial, and
I'd argue not interesting enough to try and solve within git,
specially when most subprojects are going to be using a different SCM
anyway. And all the *interesting* operations (branch, commit, tag) are
perhaps indeed interesting problems to solve, but definite misfeatures
in a tool that tries to be sane and minimalistic.

IOWs, adding some repo metadata describing subprojects is the wrong
thing to do, just like tracking patches via metadata would be the
wrong thing to do. It's all about files -- which git handles
masterfully.

cheers,

martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-15  0:28                   ` Martin Langhoff
@ 2006-01-15  0:49                     ` Junio C Hamano
  2006-01-15  1:55                       ` Tom Prince
  2006-01-16  5:06                     ` Daniel Barkalow
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2006-01-15  0:49 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: gitzilla, git, Johannes Schindelin, Simon Richter

Martin Langhoff <martin.langhoff@gmail.com> writes:

> I am with gitzilla on this one. Let the projects have their own
> bootstraping mechanisms, using make, ant or whatever catches their
> fancy. One of the great things about git is that it doesn't assume
> that it's being used by all the projects in the world -- thanks to
> Linus' disregard for arbitrary metadata and to your git-cherry
> implementation, it's all about the content -- and so it interoperates
> great with Arch, SVN, CVS, etc.

I had the exactly the same reaction when I saw the project
bundling facility of Arch (tla 1.0 -- I do not know what the
newer versions use).  It probably was a great way to tie two or
more Arch projects together, but it would quickly become less
useful once the component project is outside Arch space and the
toplevel project would end up with doing some Makefile targets
like ALASCM described.

I hope this settles this issue and nobody would bring up "Wee
want subprojects" ever again ;-).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 20:16           ` Junio C Hamano
@ 2006-01-15  1:01             ` Junio C Hamano
  2006-01-16 10:44             ` Josef Weidendorfer
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-15  1:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Continuing with the "union" approach...

Junio C Hamano <junkio@cox.net> writes:

>  - append the tree object in "kernel" commit object to the
>    current index, rerooted at linux-2.6/; similar for "gcc" at
>    gcc-4.0/. We may need a new mode and option for read-tree for
>    this, or we may not.  Internally this step would be scripted
>    in "git bind" wrapper like this:
>
> 	git read-tree --bind --prefix=linux-2.6 kernel
> 	git read-tree --bind --prefix=gcc-4.0 gcc
>
>    and would result in an index file that has these trees
>    "mounted" at specified places...

Clarification.  By "mounted", I mean 'without affecting existing
index entries, create index entries from the tree, with all the
paths have "linux-2.6/" prefixed to them'.

>  - record the branch name vs subproject directory binding in
>    $GIT_DIR/ somewhere, say $GIT_DIR/mtab ;-).
>
> 	$ cat .git/mtab
> 	kernel	linux-2.6
>       gcc	gcc-4.0

I now realize this needs to be something like:

	master	kernel=linux-2.6/ gcc=gcc-4.0/

that is, "when on branch master, bind these two heads at these
directories", to allow switching to another branch and switching
back to this branch.  And the file should probably be called
$GIT_DIR/modules, to parallel CVSROOT/modules file.

>	$ git cat-file commit HEAD
>       tree e9de76f2e141824439caa00a65e3b91d05d125c9
>       parent bfca932434cc65e7aa90794e7c4d66f75d00b16a
>       bind a8fe7257b8427d31cfcca0aa336335bb43689fc9 linux-2.6
>       bind b3b2df23226634f42c9646bd7961fbea8b00f914 gcc-4.0
>       author Junio C Hamano <junkio@cox.net> 1137205528 -0800
>       committer Junio C Hamano <junkio@cox.net> 1137205528 -0800
>
>	Bind kernel and gcc into us.
>...
> Now I have to think about clones and merges but this is getting
> too long so I'll leave it to a separate message.

The core-level cloning would just "clone" the objects, treating
"bind" line in the commit just like "parent" to pull necessary
objects.

Checkout would involve the usual read-tree -u which extracts the
tree (which is the whole tree, with files of the subprojects in
it), and notices "bind" lines are there but there are no
matching $GIT_DIR/modules entries for those directories.
Probably it would create $GIT_DIR/refs/heads/bind/a8fe725 for
the linux-2.6 subproject (what the original committer called
"kernel" branch), and similarly for the gcc-4.0 subproject, add
an appropriate entry to $GIT_DIR/modules file.  The user would
then rename the branch names and optionally arrange remotes/
files to update the bound branches appropriately:

	$ mv .git/refs/heads/bind/a8fe725 .git/refs/heads/kernel

Now, let's say this "master" branch is checked out, and somehow
the "kernel" branch gets updated.  That is, the commit recorded
on the "bind" line of the HEAD commit does not match the branch
head that can be found out via $GIT_DIR/modules file.  This will
not happen if you are committing into the "master" branch using
the "commit to subprojects and then to the toplevel project"
mechanism yourself, but it would happen if the "kernel" branch
was moved by "git fetch" fast-forwarding, or if you switched to
the "kernel" branch (which would essentially remove everything
from your tree, and checkout the kernel source at the root
level, not in linux-2.6/ subdirectory), did an upstream merge
yourself, and switched back to the "master" branch.

To keep the problem simpler, let's say we only deal with the
case where "kernel" branch head is a fast-forward of what is on
"bind" in the HEAD commit of "master" branch.  Then "checkout"
needs to notice it, and check out the subdirectory from the
"kernel" branch head (*not* using the object name on "bind"
line).

So the outline of the "checkout" would be like this:

 * Read commit object from new HEAD.

 * For each "bind" line:

   If the subdirectory does not have a corresponding branch,
   create one in $GIT_DIR/refs/heads/bind/; record it in
   $GIT_DIR/modules for the new branch (otherwise leave branch
   as is).

   Make sure the commit recorded on "bind" line is an ancestor
   of the branch head.  Otherwise it is an error and checkout is
   prevented until the "kernel" branch is resolved to be a
   descendant of it.

   Run "read-tree -u --prefix=" to merge in the subtree into the
   index, and update the working tree.

At this point, there may be mismatch between the tree in the
HEAD and the working tree files and index, when subproject
commit recorded on the "bind" line is different from the
corresponding subproject branch head, and "git diff" would show
it.  When making a commit here, the "subproject and then
toplevel" commit scheme I described earlier would record the
current "kernel" branch head on the "bind" line in the new
commit, along with the tree object that contains the tree from
"kernel" branch head commit as a subtree.

About "merge", we should be able to do this:

	$ git checkout master ;# the whole mess
        $ git pull -b kernel git://..torvalds/linux-2.6.git/

that is, 'pull from this URL but into "kernel" branch not to the
current branch'.  Independent of this "subprojects" topic,
merging in a separate temporary directory into non-current
branch is something we have talked about some time ago, and in
this particular case, instead of using a throw-away temporary
directory, we have a pre-made directory to do the merge already,
so let's say that is solved elsewhere first.  Once we have that,
the above "checkout" would be able to integrate the result into
the "master" project.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-15  0:49                     ` Junio C Hamano
@ 2006-01-15  1:55                       ` Tom Prince
  0 siblings, 0 replies; 56+ messages in thread
From: Tom Prince @ 2006-01-15  1:55 UTC (permalink / raw)
  To: git

On Sat, Jan 14, 2006 at 04:49:15PM -0800, Junio C Hamano wrote:
> Martin Langhoff <martin.langhoff@gmail.com> writes:
> 
> > I am with gitzilla on this one. Let the projects have their own
> > bootstraping mechanisms, using make, ant or whatever catches their
> > fancy. One of the great things about git is that it doesn't assume
> > that it's being used by all the projects in the world -- thanks to
> > Linus' disregard for arbitrary metadata and to your git-cherry
> > implementation, it's all about the content -- and so it interoperates
> > great with Arch, SVN, CVS, etc.
> 
> 
> I hope this settles this issue and nobody would bring up "Wee
> want subprojects" ever again ;-).
> 

But since we can import everything into a GIT repository, and have
(some) tools for pushing changes back, we can pretend that it is being
used for every project in the world.

  Tom

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH] Cogito support for simple subprojects
  2006-01-11 15:58 RFC: Subprojects Simon Richter
  2006-01-11 16:44 ` Johannes Schindelin
  2006-01-12  3:19 ` Alexander Litvinov
@ 2006-01-15 15:07 ` Petr Baudis
  2006-01-15 17:38   ` Linus Torvalds
  2006-01-15 19:15   ` Junio C Hamano
  2 siblings, 2 replies; 56+ messages in thread
From: Petr Baudis @ 2006-01-15 15:07 UTC (permalink / raw)
  To: Simon Richter; +Cc: git

  Hello,

  I've tried to take a different approach - KISS and don't make the
subprojects part of the git-tracked tree but a thing purely local to
your particular checkout. Subprojects are simply listed in
.git/subprojects and various commands are called recursively on them.

  - No auto-cloning of subprojects is possible.
  - Switching between branches and merging is troublesome in case the
    required subprojects arrangement changes inbetween. Now, this is
    a matter of taste - I don't see this as a huge problem since you
    can make special provisions for that and this should be going to be
    so rare that it's not worth optimizing for in my eyes.
  + It's simple.
  + It's flexible. You can have _optional_ subprojects - IIRC e.g.
    mplayer can use ffmpeg if it's checked out as a subproject but
    will use own copy if it's not there. You can clone subprojects
    based e.g. on selected features before compilation. If you do so,
    your GIT won't bother you with uncommitted local changes, and you
    will not have to filter this out from any other changes you are
    going to commit.

  The main goal of this is simply to be able to check out bunch of
stuff to subdirectories and make cg-update update all of it, without
any special scripts. ;-)

  This patch is just trivial proof-of-concept thing which makes only
cg-update and cg-fetch aware of the subprojects; many commands still
need to be taught about subprojects but some don't - I currently don't
think a recursive cg-merge is a particularily good idea, for one.
I think the good default is to make all read-only commands by default
recursive and all modifying commands by default non-recursive. (And
it might be useful to be able to mark some subprojects read-only.)

  How to create a subproject? Simply cg-clone inside a working copy,
it will register it automagically. So far there are no tools to further
maintain the subprojects, though, therefore a mv or rm needs to be
followed by an appropriate modification of parent .git/subprojects.


  This is not committed yet - I'm curious about your opinions.

diff --git a/TODO b/TODO
index 0658b39..29daf30 100644
--- a/TODO
+++ b/TODO
@@ -90,23 +90,13 @@ cg-*patch should be pre-1.0.)
 
 * cg-Xfetchprogress showing smooth progress for packfiles
 
+* Enhance subprojects notion
+	So far the subprojects support is trivial and prone to user error.
+	E.g. cg-add should check if it doesn't poke into a subproject,
+	cg-status should list subprojects, etc.
 
-Post 1.0:
-
-* Subprojects
-	Support a GIT project inside a GIT project:
 
-		x/.git
-		x/foo/bar/.git
-		x/foo/bar/baz/.git
-		x/quux/zot/.git
-
-	That means cg-update working recursively and cg-add'n'stuff
-	checking if there isn't another .git along the path of its
-	argument.
-
-	Needs more thought, especially wrt. fetching and merging
-	recursive semantics.
+Post 1.0:
 
 * Comfortable cg-log
 	Probably make it a real terminal application, not just less
diff --git a/cg-Xlib b/cg-Xlib
index 46a8a73..6e6265e 100755
--- a/cg-Xlib
+++ b/cg-Xlib
@@ -197,6 +197,10 @@ list_untracked_files()
 		if [ -f "$EXCLUDEFILE" ]; then
 			EXCLUDE[${#EXCLUDE[@]}]="--exclude-from=$EXCLUDEFILE"
 		fi
+		EXCLUDEFILE="$_git/subprojects"
+		if [ -f "$EXCLUDEFILE" ]; then
+			EXCLUDE[${#EXCLUDE[@]}]="--exclude-from=$EXCLUDEFILE"
+		fi
 		# This is just for compatibility (2005-09-16).
 		# To be removed later.
 		EXCLUDEFILE="$_git/exclude"
@@ -209,6 +213,29 @@ list_untracked_files()
 	git-ls-files -z --others "${EXCLUDE[@]}"
 }
 
+# Usage: subprojects_recurse ACTIONNAME COMMAND...
+# Run command recursively on subprojects, displaying warning using the
+# ACTIONNAME string in case any of them failed.
+subprojects_recurse()
+{
+	[ -s "$_git/subprojects" ] || return 0
+	local failures=0 subprj= actionname="$1" s=
+	local Actionname="$(echo "$actionname" | perl -pe '$_=ucfirst')"
+	shift
+	while IFS= read -r subprj; do
+		echo "Running $actionname in $subprj..." >&2
+		if ( cd "$subprj" && "$@" ); then
+			echo "$Actionname in $subprj succeeded." >&2
+		else
+			echo "$Actionname in $subprj failed!" >&2
+			failures=$(($failures+1))
+		fi
+	done <"$_git/subprojects"
+	local s=; if [ $failures -gt 1 ]; then s=s; fi
+	[ $failures -gt 0 ] && echo "Warning: $failures subproject $actionname$s failed" >&2
+	return $failures
+}
+
 # Usage: showdate SECONDS TIMEZONE [FORMAT]
 # Display date nicely based on how GIT stores it.
 # Save the date to $_showdate
diff --git a/cg-clone b/cg-clone
index f86a548..dfb6dc8 100755
--- a/cg-clone
+++ b/cg-clone
@@ -11,6 +11,15 @@
 # parameter is omitted, the basename of the source repository is used as the
 # destination.
 #
+# If you are cloning inside another working tree, you are automatically
+# establishing a subproject - that means that when you will update the
+# parent project, this project will be auto-updated as well, and in the
+# future, certain other operations may also recurse to subprojects. Use the
+# -P option to prevent this from becoming a subproject. Also please note
+# that the subprojects support is preliminary and subject to change. If you
+# remove the subproject later, you must also remove the corresponding entry
+# in '.git/subprojects' for now.
+#
 # OPTIONS
 # -------
 # -l::	Symlink the object database when cloning locally
@@ -24,20 +33,28 @@
 #	Note that you MUST NOT prune repository containing a symlink
 #	or being symlinked to.
 #
+# -P::	Prevent this from becoming a subproject
+#	If this option is passed, cg-clone will never create a subproject
+#	even if called inside working tree of another project.
+#
 # -s::	Clone into the current directory
 #	Clone in the current directory instead of creating a new one.
 #	Specifying both -s and a destination directory makes no sense.
+#	This also implies -P.
 
-USAGE="cg-clone [-l] [-s] LOCATION [DESTDIR]"
+USAGE="cg-clone [-l] [-P] [-s] LOCATION [DESTDIR]"
 _git_repo_unneeded=1
 
 . ${COGITO_LIB}cg-Xlib || exit 1
 
 same_dir=
 symlink=
+may_subproject=1
 while optparse; do
 	if optparse -l; then
 		symlink=1
+	elif optparse -P; then
+		may_subproject=
 	elif optparse -s; then
 		same_dir=1
 	else
@@ -65,7 +82,13 @@ else
 	location="$location"
 fi
 
+parentprj=
+parentpath=
 if [ ! "$same_dir" ]; then
+	if [ "$may_subproject" ]; then
+		parentprj="$(git-rev-parse --git-dir 2>/dev/null)"
+		parentpath="$(git-rev-parse --show-prefix 2>/dev/null)$dir"
+	fi
 	[ -e "$dir" ] && die "$dir/ already exists"
 	mkdir "$dir" || exit $?
 	cd "$dir" || exit $?
@@ -94,4 +117,14 @@ cp "$_git/refs/heads/origin" "$_git/refs
 	git-update-index --refresh ||
 	exit 1
 
+if [ "$parentprj" ]; then
+	cd ..
+	parentroot="${parentprj%.git}"
+	if [ -z "$parentroot" ]; then
+		parentroot="$(pwd)"
+	fi
+	echo "Registering as a subproject of $parentroot..."
+	echo "$parentpath" >>"$parentprj"/subprojects
+fi
+
 echo "Cloned to $dir/ (origin $location available as branch \"origin\")"
diff --git a/cg-fetch b/cg-fetch
index c3dfb81..9d44d9f 100755
--- a/cg-fetch
+++ b/cg-fetch
@@ -28,6 +28,10 @@
 # -f:: Force the complete fetch even if the heads are the same.
 #	Force the complete fetch even if the heads are the same.
 #
+# -R:: Do not recurse to subprojects
+#	Do not recursively fetch the subprojects (see the `cg-clone`
+#	documentation for more information).
+#
 # -v:: Enable verbosity
 #	Display more verbose output - most notably list all the files
 #	touched by the fetched changes.
@@ -53,7 +57,7 @@
 #	won't unpack the transferred pack.
 
 
-USAGE="cg-fetch [-f] [-v] [BRANCH_NAME]"
+USAGE="cg-fetch [-f] [-R] [-v] [BRANCH_NAME]"
 
 . ${COGITO_LIB}cg-Xlib || exit 1
 deprecated_alias cg-fetch cg-pull
@@ -234,12 +238,15 @@ fetch_tags()
 
 
 recovery=
+recurse=1
 verbose=
 while optparse; do
 	if optparse -f; then
 		# When forcing, let the fetch tools make more extensive
 		# walk over the dependency tree with --recover.
 		recovery=--recover
+	elif optparse -R; then
+		recurse=
 	elif optparse -v; then
 		verbose=1
 	else
@@ -248,9 +255,10 @@ while optparse; do
 done
 
 name="${ARGS[0]}"
-
 [ "$name" ] || { [ -s "$_git/branches/origin" ] && name=origin; }
+[ "$recurse" ] && subprojects_recurse "fetch" cg-fetch $recovery $verbose "$name"
 [ "$name" ] || die "where to fetch from?"
+
 uri=$(cat "$_git/branches/$name" 2>/dev/null) || die "unknown branch: $name"
 
 rembranch=
diff --git a/cg-update b/cg-update
index 1d6e0a0..1b21338 100755
--- a/cg-update
+++ b/cg-update
@@ -25,23 +25,30 @@
 # -f:: Force the complete fetch even if the heads are the same.
 #	Force the complete fetch even if the heads are the same.
 #
+# -R:: Do not recurse to subprojects
+#	Do not recursively update the subprojects (see the `cg-clone`
+#	documentation for more information).
+#
 # --squash:: Use "squash" merge to record pending commits as a single merge commit
 #	"Squash" merge - condense all the to-be-merged commits to a single
 #	merge commit. This is not to be used lightly; see the cg-merge
 #	documenation for further details.
 
-USAGE="cg-update [-f] [--squash] [BRANCH_NAME]"
+USAGE="cg-update [-f] [-R] [--squash] [BRANCH_NAME]"
 _git_requires_root=1
 
 . ${COGITO_LIB}cg-Xlib || exit 1
 
 force=
 squash=
+recurse=1
 while optparse; do
 	if optparse -f; then
 		force=-f
 	elif optparse --squash; then
 		squash=--squash
+	elif optparse -R; then
+		recurse=
 	else
 		optfail
 	fi
@@ -49,13 +56,14 @@ done
 
 name="${ARGS[0]}"
 [ "$name" ] || { [ -s "$_git/branches/origin" ] && name=origin; }
+[ "$recurse" ] && subprojects_recurse "update" cg-update $force $squash "$name"
 [ "$name" ] || die "where to update from?"
 
 # cg-merge can do better decision about fast-forwarding if it sees this.
 [ -s "$_git/refs/heads/$name" ] && export _cg_orig_head="$(cat "$_git/refs/heads/$name")"
 
 if [ -s "$_git/branches/$name" ]; then
-	cg-fetch $force "$name" || exit 1
+	cg-fetch -R $force "$name" || exit 1
 else
 	echo "Updating from a local branch."
 fi
diff --git a/t/t9215-update-recursive.sh b/t/t9215-update-recursive.sh
new file mode 100755
index 0000000..d0e27a4
--- /dev/null
+++ b/t/t9215-update-recursive.sh
@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2005 Petr Baudis
+#
+test_description="Tests recursive cg-update functionality
+
+Create a subproject and then try to cg-update."
+
+. ./test-lib.sh
+
+mkdir prj1
+echo file >prj1/file
+test_expect_success 'initialize project 1' \
+	"(cd prj1 && cg-init -I && cg-add file && cg-commit -C -m\"Initial commit\")"
+
+mkdir prj2
+echo file >prj2/FILE
+test_expect_success 'initialize project 2' \
+	"(cd prj2 && cg-init -I && cg-add FILE && cg-commit -C -m\"Initial commit\")"
+
+test_expect_success 'clone project 1' \
+	"cg-clone prj1 clone"
+test_expect_success 'clone project 2 as subproject of project 1' \
+	"(cd clone && cg-clone ../prj2)"
+
+test_expect_success 'commit to project 1' \
+	"(cd prj1 && echo foo >>file && cg-commit -m\"Commit in project 1\")"
+test_expect_success 'commit to project 2' \
+	"(cd prj2 && echo bar >>FILE && cg-commit -m\"Commit in project 2\")"
+
+test_expect_success 'non-recursive update' \
+	"(cd clone && cg-update -R)"
+test_expect_success 'check if the prj1 head was updated in clone' \
+	"(cmp prj1/.git/refs/heads/master clone/.git/refs/heads/master)"
+test_expect_success 'check if the prj1 working copy was updated in clone' \
+	"(cmp prj1/file clone/file)"
+test_expect_failure 'check if the prj2 head was not updated in clone' \
+	"(cmp prj2/.git/refs/heads/master clone/prj2/.git/refs/heads/origin)"
+
+test_expect_success 'commit to project 1' \
+	"(cd prj1 && echo baz >>file && cg-commit -m\"Commit in project 1\")"
+
+test_expect_success 'recursive update' \
+	"(cd clone && cg-update)"
+test_expect_success 'check if the prj1 head was updated in clone' \
+	"(cmp prj1/.git/refs/heads/master clone/.git/refs/heads/master)"
+test_expect_success 'check if the prj1 working copy was updated in clone' \
+	"(cmp prj1/file clone/file)"
+test_expect_success 'check if the prj2 head was updated in clone' \
+	"(cmp prj2/.git/refs/heads/master clone/prj2/.git/refs/heads/master)"
+test_expect_success 'check if the prj2 working copy was updated in clone' \
+	"(cmp prj2/FILE clone/prj2/FILE)"
+
+test_done


-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Cogito support for simple subprojects
  2006-01-15 15:07 ` [RFC][PATCH] Cogito support for simple subprojects Petr Baudis
@ 2006-01-15 17:38   ` Linus Torvalds
  2006-01-15 19:15   ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2006-01-15 17:38 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Simon Richter, git



On Sun, 15 Jan 2006, Petr Baudis wrote:
> 
>   I've tried to take a different approach - KISS and don't make the
> subprojects part of the git-tracked tree but a thing purely local to
> your particular checkout. Subprojects are simply listed in
> .git/subprojects and various commands are called recursively on them.

This seems very sane. Goodie.

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Cogito support for simple subprojects
  2006-01-15 15:07 ` [RFC][PATCH] Cogito support for simple subprojects Petr Baudis
  2006-01-15 17:38   ` Linus Torvalds
@ 2006-01-15 19:15   ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-15 19:15 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Simon Richter

Petr Baudis <pasky@suse.cz> writes:

> ... I currently don't
> think a recursive cg-merge is a particularily good idea, for one.
> I think the good default is to make all read-only commands by default
> recursive and all modifying commands by default non-recursive. (And
> it might be useful to be able to mark some subprojects read-only.)

All sounds sane and simple.  Good job!

Especially because this does not even try to let the project
express its version dependencies on its subprojects, I like it
for its simplicity (which makes it very easy to explain, to my
mind that is the biggest plus).  However, I fear that others
might complain to say that the contained things do not deserve
to be called "subprojects" if there is no version linkage [*1*].

I think something like this is greatly helpful for people (like
me) as "an end user who builds from source out of SCM not from
tarballs."

If you go this route, one minor concern is what the format of
the $GIT_DIR/subproject should be (and if they do not deserve to
be called "subproject" then what the name of the file be),
because this may likely be a "nice if they were compatible" item
across Porcelains.  The list of names separated with LF has two
very big pluses:

 - it is certainly the easiest to parse.

 - it can readily be used as --exclude-from file, as long as you
   do not care about another directory with a same name
   somewhere other than the subproject itself.

Even if the limitation to the latter becomes a real issue, a
separate exclude-from file could easily be generated on the fly
(prefix them with '/' to force "not anywhere in the subtree but
only here" matching, perhaps with escaping shell glob pattern
while you are at it), which does not sound so bad.

[Footnote]

*1* I do not personally care about version linkage; this is me
being lazy to avoid core side support ;-).  Yesterday I was
mucking with rev-list code to see how gitlink and/or bind commit
would affect what it needs to do (especially wrt its --objects
flag), and I did not like the potential code impact I saw there.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-15  0:28                   ` Martin Langhoff
  2006-01-15  0:49                     ` Junio C Hamano
@ 2006-01-16  5:06                     ` Daniel Barkalow
  2006-01-16 19:08                       ` A Large Angry SCM
  1 sibling, 1 reply; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-16  5:06 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Junio C Hamano, gitzilla, git, Johannes Schindelin, Simon Richter

On Sun, 15 Jan 2006, Martin Langhoff wrote:

> On 1/15/06, Junio C Hamano <junkio@cox.net> wrote:
> > > The "get" rule for each sub-project could be something like:
> > >
> > >       git_sub-project:
> > >               mkdir sub-project
> > >               cd sub-project
> > >               git-init-db
> > >               git-fetch <fetch-options> <repository> <refspec>
> > >               git-checkout <branch>
> > >               $(MAKE) get_sub_components
> >
> > There lies a drake here --- <repository> is not the same for
> > everybody.  It is not a big showstopper dragon, though.
> 
> Well, that /little complication/ applies to doing it in git too ;-)
> There's no way to tell how the dev doing the top level checkout has
> access to the subproject repos.
> 
> I am with gitzilla on this one. Let the projects have their own
> bootstraping mechanisms, using make, ant or whatever catches their
> fancy. One of the great things about git is that it doesn't assume
> that it's being used by all the projects in the world -- thanks to
> Linus' disregard for arbitrary metadata and to your git-cherry
> implementation, it's all about the content -- and so it interoperates
> great with Arch, SVN, CVS, etc.

But most of the content of the project that started this thread is the 
revisions of the subprojects. Sure, it could all be done in the build 
system, but then it becomes impractical to manage. Git could refuse to 
support tracking the executable bit on files, or what directories things 
are in, and we could tell people to use their build systems to set these 
things, but it would make the tool impractical to use. We want to track 
some metadata, because it's actually important; what we don't want to 
track is the metadata that is local to the particular working tree. That's 
why we track only one executable bit, not a full set of permissions; it's 
a matter of local policy who can interact with the files in a working 
tree, but it's part of the content whether a file is executable.

So the problem with handling subprojects with the build system is that it 
is too tempting to use the revision control system directly on the 
subproject, at which point the thing you're developing and testing isn't 
at all what other people will get if they check out your commit. You want 
"git status" to report it as an uncommitted change if you have a different 
revision of the subproject than your previous commit had, and it can't 
tell if this information is buried in the build system.

I like Linus's proposal: which revision of which project goes where is 
part of the content, while how you manipulate data for that project is a 
matter of local policy, and is not tracked, although it might be a good 
idea to let project provide overridable defaults (so that, if you're a 
random member of the general public and don't have a special method for 
accessing the repository, you don't have to track it down yourself).

The tricky question is whether we should permit the "subproject" objects 
to specify a revision that isn't a hash, for use in identifying revisions 
of subprojects in other systems.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14  8:59       ` Junio C Hamano
  2006-01-14 19:16         ` Linus Torvalds
@ 2006-01-16  7:28         ` Alexander Litvinov
  2006-01-16 10:16           ` Andreas Ericsson
  2006-02-20 13:16         ` Uwe Zeisberger
  2 siblings, 1 reply; 56+ messages in thread
From: Alexander Litvinov @ 2006-01-16  7:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Saturday 14 January 2006 14:59, Junio C Hamano wrote:
> Now I'll think aloud about a completely different design.
>
> We could simply overlay the projects.  I think this is what
> Johannes suggested earlier.
>
> You keep one branch for each "subproject", and make commits into
> each branch (i.e. if you modified files for the upstream kernel,
> the change is committed to the branch for linux-2.6 subproject),
> but when checking things out, you do an equivalent of octopus
> merge across subprojects.
If I cleary understand this idea it is NOT that I dreaming about. Almost all 
our sub-projects are used in more than one project (imaging network layer 
library). So variant with gitlink is that I willing.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 20:02             ` Linus Torvalds
  2006-01-14 20:30               ` A Large Angry SCM
@ 2006-01-16  7:48               ` Alex Riesen
  1 sibling, 0 replies; 56+ messages in thread
From: Alex Riesen @ 2006-01-16  7:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: A Large Angry SCM, git, Junio C Hamano, Johannes Schindelin,
	Simon Richter

On 1/14/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > So far I've not seen any convincing arguments why the sub-projects can not be
> > managed by the Makefile, or equivalent, of the super-project. Particularly
> > when the sub-projects have a life of their own.
>
> Now, from a developer standpoint I actually agree with you. I find
> sub-projects totally useless - I'm much happier just having separate
> trees.
>
> The advantage (as far as I can tell) of sub-projects is not that they are
> easier to develop in, but that it's a total nightmare for the technical
> _user_ to download ten different projects from ten different sites, and
> configure them properly and install them in the right order, and keep them
> up-to-date.
>
> There are projects that I simply gave up even trying to track: I wasn't
> interested in being a developer per se, but I _was_ interested in trying
> to test and give feedback to the current development tree - but it was
> just too damn confusing to get it working.
>
> If I could have just done a "git clone <top-level>" to get it all, I'd
> have been a much more productive user.
>
> This is why I think sub-projects are more about "git checkout" and an
> automated "git fetch" than anything else. Doing actual development etc you
> can easily do one project at a time. "git diff" and "git commit" wouldn't
> need any real ability to recurse into subprojects and try to make it
> seamless. And if you do a "git pull" that needs to do anything but
> fast-forward, you might as well resolve the sub-projects one by one.

That is exactly how subprojects are used in Perforce- and ClearCase-like SCM:
the working tree is "configured" to contain the super-project (build
configuration)
and the actual work happens in the subproject and _only_ there. The mentioned
systems even have heavily used permission system just to prevent either
checkout or commit anywhere outside the area of responsibility of a developer.
(The "permissions" are somehow pointless in git context, just mentioned them
to underline the main point).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16  7:28         ` Alexander Litvinov
@ 2006-01-16 10:16           ` Andreas Ericsson
  0 siblings, 0 replies; 56+ messages in thread
From: Andreas Ericsson @ 2006-01-16 10:16 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git

Alexander Litvinov wrote:
> On Saturday 14 January 2006 14:59, Junio C Hamano wrote:
> 
>>Now I'll think aloud about a completely different design.
>>
>>We could simply overlay the projects.  I think this is what
>>Johannes suggested earlier.
>>
>>You keep one branch for each "subproject", and make commits into
>>each branch (i.e. if you modified files for the upstream kernel,
>>the change is committed to the branch for linux-2.6 subproject),
>>but when checking things out, you do an equivalent of octopus
>>merge across subprojects.
> 
> If I cleary understand this idea it is NOT that I dreaming about. Almost all 
> our sub-projects are used in more than one project (imaging network layer 
> library). So variant with gitlink is that I willing.


Then it isn't so much a subproject as a separate project of its own. 
Otherwise glibc would be a subproject of pretty much everything and 
that's hardly a sane setup.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14 20:16           ` Junio C Hamano
  2006-01-15  1:01             ` Junio C Hamano
@ 2006-01-16 10:44             ` Josef Weidendorfer
  2006-01-16 20:49               ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Josef Weidendorfer @ 2006-01-16 10:44 UTC (permalink / raw)
  To: git, Junio C Hamano

On Saturday 14 January 2006 21:16, you wrote:
> Yes, I agree to the above 100%; the serious disadvantages come
> from the fact that we do not have clear separation between
> subprojects -- which new files belong to what subproject.  I
> ...
>  - Extend "commit" objects for the toplevel project to record
>    what subprojects with what head commits are contained at
>    which subdirectory.

The suggested "bind" info in commit objects has the same problem
as the original overlay: if the superproject already has a
subdirectory kernel/, and there is an additional "bind" specification
in commits also for kernel/, what should be done?

So the gitlink object seems to be the only solution if we want to
bind git versions of subprojects into a superproject.

But as this seems to make everything quite complex and not-obvious for
a user, I am with Paskys simple subproject idea.

Josef

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16  5:06                     ` Daniel Barkalow
@ 2006-01-16 19:08                       ` A Large Angry SCM
  2006-01-16 20:20                         ` Daniel Barkalow
  0 siblings, 1 reply; 56+ messages in thread
From: A Large Angry SCM @ 2006-01-16 19:08 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Martin Langhoff, Junio C Hamano, git, Johannes Schindelin,
	Simon Richter

Daniel Barkalow wrote:
[...]
> 
> So the problem with handling subprojects with the build system is that it 
> is too tempting to use the revision control system directly on the 
> subproject, at which point the thing you're developing and testing isn't 
> at all what other people will get if they check out your commit. You want 
> "git status" to report it as an uncommitted change if you have a different 
> revision of the subproject than your previous commit had, and it can't 
> tell if this information is buried in the build system.

Using "git-status" is the wrong tool to use there. What you should be 
using is "make project_status". Claiming "that it is too tempting to use 
the revision control system on the subproject" is wrong; you should use 
the SCM (of the subproject) to manage the subproject. You use the build 
system to manage the _entire_ project.

> I like Linus's proposal: which revision of which project goes where is 
> part of the content, while how you manipulate data for that project is a 
> matter of local policy, and is not tracked, although it might be a good 
> idea to let project provide overridable defaults (so that, if you're a 
> random member of the general public and don't have a special method for 
> accessing the repository, you don't have to track it down yourself).

I think Linus' proposal is an attempt to solve the problem in the wrong 
place; it encumbers the SCM with features of limited applicability, that 
impose a specific methodology on how to handle subprojects, and requires 
that the SCM of the subproject be Git.

> The tricky question is whether we should permit the "subproject" objects 
> to specify a revision that isn't a hash, for use in identifying revisions 
> of subprojects in other systems.

Why would you want to limit how required versions of subprojects are 
specified? Your project policies and procedures may require that 
subprojects be specified by a subproject SCM specific immutable revision 
but the policies and procedures of other projects may not be so 
restrictive and could accept a tag identifying the latest "stable" (or 
something) revision.

--

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16 19:08                       ` A Large Angry SCM
@ 2006-01-16 20:20                         ` Daniel Barkalow
  2006-01-16 22:25                           ` A Large Angry SCM
  0 siblings, 1 reply; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-16 20:20 UTC (permalink / raw)
  To: A Large Angry SCM
  Cc: Martin Langhoff, Junio C Hamano, git, Johannes Schindelin,
	Simon Richter

On Mon, 16 Jan 2006, A Large Angry SCM wrote:

> Daniel Barkalow wrote:
> [...]
> > 
> > So the problem with handling subprojects with the build system is that it is
> > too tempting to use the revision control system directly on the subproject,
> > at which point the thing you're developing and testing isn't at all what
> > other people will get if they check out your commit. You want "git status"
> > to report it as an uncommitted change if you have a different revision of
> > the subproject than your previous commit had, and it can't tell if this
> > information is buried in the build system.
> 
> Using "git-status" is the wrong tool to use there. What you should be using is
> "make project_status". Claiming "that it is too tempting to use the revision
> control system on the subproject" is wrong; you should use the SCM (of the
> subproject) to manage the subproject. You use the build system to manage the
> _entire_ project.

I'm talking about using "git status" on the main project, in case you're 
misunderstanding me. If you can manage the entire project with the build 
system, then you don't need git or any version control at all, aside from 
your build system. But you'd also lose the ability to use webgit, bisect, 
gitk, git log, and so forth on the project as a whole.

> > The tricky question is whether we should permit the "subproject" objects to
> > specify a revision that isn't a hash, for use in identifying revisions of
> > subprojects in other systems.
> 
> Why would you want to limit how required versions of subprojects are
> specified? Your project policies and procedures may require that subprojects
> be specified by a subproject SCM specific immutable revision but the policies
> and procedures of other projects may not be so restrictive and could accept a
> tag identifying the latest "stable" (or something) revision.

If you accept a tag identifying the latest stable revision, then you might 
as well not bother. The point of revision controlling a project is to be 
able to reconstruct previous states. If you allow any event, especially 
outside, unrelated, events to change the reconstructed state for a 
revision, then this is not the case. Your normal debugging situation will 
be "It's broken, and I didn't change anything." because someone somewhere 
else changed something, and you have no record of what last worked. And 
you can obviously forget any hope of "git bisect" working.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16 10:44             ` Josef Weidendorfer
@ 2006-01-16 20:49               ` Junio C Hamano
  2006-01-17  5:46                 ` Daniel Barkalow
  2006-01-23  0:50                 ` Petr Baudis
  0 siblings, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-16 20:49 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

> The suggested "bind" info in commit objects has the same problem
> as the original overlay: if the superproject already has a
> subdirectory kernel/, and there is an additional "bind" specification
> in commits also for kernel/, what should be done?
>
> So the gitlink object seems to be the only solution if we want to
> bind git versions of subprojects into a superproject.

In "pu", I have some of the necessary basic pieces for "bind"
approach, barely enough so that anybody interested could start
prototyping using them as building blocks.  It still has very
rough edges; the missing includes rev-list and fsck-objects, so
you cannot do a send-pack or fetch-pack yet.

Yesterday I was working on "gitlink" approach to have similar
core-side support for prototyping.  I haven't finished it into a
buildable state yet (it is not in "pu"), and I am pessimistic if
I ever will X-<.

I think the updated "bind" thing makes the two approaches
semantically equivalent (i.e. it does not allow an arbitrary
overlayed setup anymore).  We simply do not allow the
conflicting "bind".  So neither is the _only_ solution.  We
probably could make both to work, but the details differ.

 * With "gitlink", the index of containing project never has
   subprojects parts of the tree, which I see it as an advantage
   compared to what "bind" does.  It only has one "gitlink"
   entry per each subproject.  update-index, read-tree,
   ls-files, diff-*, etc. needs to be aware of "gitlink" object.
   Especially tricky is read-tree.  It needs to treat a
   "gitlink" object as a directory for D/F conflict detection
   purposes, but treat it similar to blobs in most other aspects
   (e.g.  results in one entry in the index).  The stat
   information update-index and diff-files uses for quick
   up-to-date check needs to be taught not to worry about the
   stat information of the subdirectory a "gitlink" object
   points at (e.g. if you do a whole-tree build, the timestamp
   of the directory would change, but that does not mean the
   subtree is dirty).  tree/directory traversal code needs to be
   aware of "gitlink" and stop there.  This approach involves
   quite a lot of code changes, mostly because what is in the
   current index never correspond to a directory on the
   filesystem but "gitlink" quacks like a directory.

 * With "bind", the index of containing project keeps the entire
   tree structure, including subproject part.  In fact, there is
   no other separate index for the subproject part.

   An updated write-tree in "pu" can write a tree for only the
   subproject part with "write-tree --prefix=<path>/" from such
   an index file, and read-tree can read with "read-tree
   --prefix=<path>/" to graft a subproject tree on top of the
   current index contents.  Without the --prefix, write-tree
   writes out the whole thing for a commit for the containing
   project, so if somebody cloned that superproject, getting the
   whole tree out in order to "make" is just the matter of doing
   a regular "read-tree && checkout-index".

   We could introduce "bind the rest" to make write-tree write
   out a tree that contains only the containing project part and
   not any of the subproject part (e.g. Makefile, README and
   src/ but not linux-2.6/ nor gcc-4.0/ in the earlier example).
   Essentially the contents of such a tree object would be the
   same as what "gitlink" approach would have had for the
   containing project in the index file, minus "gitlink" entries
   themselves).  This is not so surprising, because the missing
   information "gitlink" approach recorded in the tree object
   itself is expressed on "bind" lines in the commit object with
   this approach.

An advantage with the "bind" approach, from the implementation
point of view, is that none of the "index vs working tree" part
of the core needs to be modified (you would notice that many
issues I had with trying "gitlink" I listed above are "index vs
working tree" issues).  "tree object vs index" part needed to be
enhanced somewhat (e.g. the re-rooting read-tree/write-tree with
the --prefix option) but it was not too painful.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16 20:20                         ` Daniel Barkalow
@ 2006-01-16 22:25                           ` A Large Angry SCM
  0 siblings, 0 replies; 56+ messages in thread
From: A Large Angry SCM @ 2006-01-16 22:25 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Martin Langhoff, Junio C Hamano, git, Johannes Schindelin,
	Simon Richter

Daniel Barkalow wrote:
> On Mon, 16 Jan 2006, A Large Angry SCM wrote:
> 
>>Daniel Barkalow wrote:
>>[...]
>>>So the problem with handling subprojects with the build system is that it is
>>>too tempting to use the revision control system directly on the subproject,
>>>at which point the thing you're developing and testing isn't at all what
>>>other people will get if they check out your commit. You want "git status"
>>>to report it as an uncommitted change if you have a different revision of
>>>the subproject than your previous commit had, and it can't tell if this
>>>information is buried in the build system.
>>Using "git-status" is the wrong tool to use there. What you should be using is
>>"make project_status". Claiming "that it is too tempting to use the revision
>>control system on the subproject" is wrong; you should use the SCM (of the
>>subproject) to manage the subproject. You use the build system to manage the
>>_entire_ project.
> 
> I'm talking about using "git status" on the main project, in case you're 
> misunderstanding me. If you can manage the entire project with the build 
> system, then you don't need git or any version control at all, aside from 
> your build system. But you'd also lose the ability to use webgit, bisect, 
> gitk, git log, and so forth on the project as a whole.

When you say "main project", do you mean the top level project and all 
of its subprojects? Or just the top level project without any of its 
subprojects?

A build system and a SCM working together can be a powerful very tool, 
even if one or both of the components is not. Since managing a project 
with a build system and no SCM means that you don't have any history and 
since you seem to want access to the project's history, I'll ignore you 
statement. (Backups and directory snapshots are primitive SCMs.)

Consider the following:

1) For a project to be a sub-project, it must also be a project.

2) So the standard SCM tools will work on the project.

3) The super-project is also a project and the standard SCM tools will 
work on it.

4) Projects managed by an SCM are only considered consistent and usable 
at specific points in that project's history; it may be every recorded 
point in it's history or it may be just a few specific recorded points.

5) A super-project only cares about specific states of its sub-projects, 
corresponding to points in the sub-project's history.

6) For each sub-project, the super-project needs to record the 
sub-project's state identifier for each recorded point in the 
super-project's history.

7) The super-project can record each sub-project's state identifier 
somewhere in the build system.

8) If the super-project's SCM is Git then webgit, bisect, gitk, git log, 
and so forth all work and will identify when and where the recorded 
state identifier of each sub-project is changed.

9) The information is available to the build system to permit using 
git-bisect in the super-project and notice that the breakage occurred 
when the recorded state identifier of a sub-project changed. If the 
sub-project used Git, the build system can automatically start 
git-bisect'ing in the sub-project.

10) The information is available to the build system to permit doing 
similar things for git-log and so forth.

11) Webgit and gitk are a little more work but by creating a git 
repository that contains all of the history and refs of each project in 
the entire project, you can navigate and view the history of any project 
in the entire project.

12) If the SCM of some of the sub-projects is not Git, the build system 
can still do git-bisect, git-log, etc. equivalents for the entire 
project (subject to the limitations of the SCMs involved).

>>>The tricky question is whether we should permit the "subproject" objects to
>>>specify a revision that isn't a hash, for use in identifying revisions of
>>>subprojects in other systems.
>>Why would you want to limit how required versions of subprojects are
>>specified? Your project policies and procedures may require that subprojects
>>be specified by a subproject SCM specific immutable revision but the policies
>>and procedures of other projects may not be so restrictive and could accept a
>>tag identifying the latest "stable" (or something) revision.
> 
> If you accept a tag identifying the latest stable revision, then you might 
> as well not bother. The point of revision controlling a project is to be 
> able to reconstruct previous states. If you allow any event, especially 
> outside, unrelated, events to change the reconstructed state for a 
> revision, then this is not the case. Your normal debugging situation will 
> be "It's broken, and I didn't change anything." because someone somewhere 
> else changed something, and you have no record of what last worked. And 
> you can obviously forget any hope of "git bisect" working.

CVS does not have the concept of "hash" for use in identifying the state 
of a particular set of files but it does have tags and CVS tags work 
like Git tags. The form of identifier that is used to identify the 
particular state of interest of a sub-project is dependent on the 
policies and procedures of the super-project, and the SCM of the 
sub-project.

The need to reproduce a particular state at sometime in the future is 
well understood, even in organizations that use tools that only support 
tags. Implying that it's not possible without using Git's immutable 
hashes as the state identifier is just wrong.

--

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16 20:49               ` Junio C Hamano
@ 2006-01-17  5:46                 ` Daniel Barkalow
  2006-01-17  6:18                   ` Junio C Hamano
  2006-01-23  0:50                 ` Petr Baudis
  1 sibling, 1 reply; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-17  5:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, git

On Mon, 16 Jan 2006, Junio C Hamano wrote:

>    We could introduce "bind the rest" to make write-tree write
>    out a tree that contains only the containing project part and
>    not any of the subproject part (e.g. Makefile, README and
>    src/ but not linux-2.6/ nor gcc-4.0/ in the earlier example).
>    Essentially the contents of such a tree object would be the
>    same as what "gitlink" approach would have had for the
>    containing project in the index file, minus "gitlink" entries
>    themselves).  This is not so surprising, because the missing
>    information "gitlink" approach recorded in the tree object
>    itself is expressed on "bind" lines in the commit object with
>    this approach.

So why not use the "bind" approach for the "index vs working tree" part, 
but write out "gitlink"-style tree objects? I think putting the info in 
the tree objects in the location the subproject would appear is nicer than 
having tree objects that tell only part of the story, and you don't have 
to worry about commits that stick a subproject on top of something in the 
tree.

In any case, I think it would be good to track where the subprojects are 
in some core state, and probably the right solution is to have special 
index entries for them, in addition to having their contents in the index. 
I'm not seeing a clear way to get from commit objects with "bind" lines to 
an index with the appropriate things read and back otherwise.

One idea I toyed with a while ago for the index/working tree 
implementation is having an index file per bound project, such that each 
project has a completely ordinary index file, and you just need to tell 
checkout-index where to write. This is especially cute because the index 
file for the superproject doesn't need to know about the subprojects at 
all; they're not in that index, and the working tree is just directories 
of untracked files. Not sure if this is a useful idea at this point or 
not.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17  5:46                 ` Daniel Barkalow
@ 2006-01-17  6:18                   ` Junio C Hamano
  2006-01-17 14:09                     ` Petr Baudis
  2006-01-17 17:41                     ` Daniel Barkalow
  0 siblings, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-17  6:18 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Josef Weidendorfer, git

Daniel Barkalow <barkalow@iabervon.org> writes:

> So why not use the "bind" approach for the "index vs working tree" part, 
> but write out "gitlink"-style tree objects?

I said "index vs working tree" as a mere example, and never said
"gitlink" is easier (or at least as easy as "bind") for "tree
object vs index" or "tree object vs working tree through index".
In fact I suspect those parts also need to be changed fairly
heavily, and to be honest, I am not very much looking forward to
investigating the details.

> In any case, I think it would be good to track where the subprojects are 
> in some core state, and probably the right solution is to have special 
> index entries for them, in addition to having their contents in the index. 

Actually, the "special entry" was what I found out to be quite a
pain, if you mean to have "linux-2.6/" in the index and have it
used in some meaningful way.  Further hacking and prototyping
_might_ convince me otherwise, but I am not so optimistic at
this moment.

> I'm not seeing a clear way to get from commit objects with "bind" lines to 
> an index with the appropriate things read and back otherwise.

Here again I am thinking aloud, remembering the earlier example
of an embedded linux project that ships with linux-2.6 and
gcc-4.0, along with its own README and Makefile at the toplevel
and src/ for its own sources.  The tools at the tip of "pu"
should be able to let you do the following:

	$ git cat-file commit $such_toplevel_commit
	tree $tree
        parent $parent
        bind $primarysub /
        bind $linuxsub linux-2.6/
        bind $gccsub gcc-4.0/
	author A U Thor <author@example.com> 1137392543 -0800
	commmitter A U Thor <author@example.com> 1137392543 -0800

        An example.

where $tree is the object name of the whole tree (no "gitlink"
object), $primarysub and $linuxsub are the object names of
commit objects for the primary subproject (which sits at the
rootlevel) and another subproject (which sits at linux-2.6/
subdirectory).

To make sure there is no misunderstanding:

	* "git-ls-tree $tree" would show the object name of
          $linuxsub^{tree} at path "linux-2.6/" because
          "tree" line of a commit describes the whole tree,
          including subprojects.

	* "git-ls-tree $primarysub" would show README,
          Makefile and src/ directories but not linux-2.6/ nor
          gcc-4.0/.

	* "git-ls-tree $linuxsub" would show COPYING, Makefile
          etc., not linux-2.6/COPYING.

Reading such a commit is easy:

	$ git-read-tree $tree ;# ;-)

But that is cheating.  Constructing such an index can be done by:

	$ git-read-tree $primarysub
        $ git-read-tree --prefix=linux-2.6/ $linuxsub
        $ git-read-tree --prefix=gcc-4.0/ $gccsub

When you have such an index, writing out various trees are:

	$ git-write-tree ;# $tree
	$ git-write-tree --prefix=linux-2.6/ ;# $linuxsub^{tree}
	$ git-write-tree --prefix=gcc-4.0/ ;# $gccsub^{tree}
	$ git-write-tree \
          --bound=linux-2.6/ --bound=gcc-4.0/ ;# $primarysub^{tree}

The decision to use what --prefix and --bound and what tree(s)
to write out must come from somewhere, and as you say it would
be nice if we _could_ stick them in the index as "special
entries", but for the purpose of prototyping I am assuming I
keep that somewhere in $GIT_DIR/ (the "mtab" in the previous
message.  Maybe "$GIT_DIR/bind" is a good name?).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17  6:18                   ` Junio C Hamano
@ 2006-01-17 14:09                     ` Petr Baudis
  2006-01-17 16:45                       ` Daniel Barkalow
  2006-01-17 17:41                     ` Daniel Barkalow
  1 sibling, 1 reply; 56+ messages in thread
From: Petr Baudis @ 2006-01-17 14:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, Josef Weidendorfer, git

Dear diary, on Tue, Jan 17, 2006 at 07:18:47AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Here again I am thinking aloud, remembering the earlier example
> of an embedded linux project that ships with linux-2.6 and
> gcc-4.0, along with its own README and Makefile at the toplevel
> and src/ for its own sources.  The tools at the tip of "pu"
> should be able to let you do the following:
> 
> 	$ git cat-file commit $such_toplevel_commit
> 	tree $tree
>         parent $parent
>         bind $primarysub /
>         bind $linuxsub linux-2.6/
>         bind $gccsub gcc-4.0/
> 	author A U Thor <author@example.com> 1137392543 -0800
> 	commmitter A U Thor <author@example.com> 1137392543 -0800
> 
>         An example.
> 
> where $tree is the object name of the whole tree (no "gitlink"
> object), $primarysub and $linuxsub are the object names of
> commit objects for the primary subproject (which sits at the
> rootlevel) and another subproject (which sits at linux-2.6/
> subdirectory).

I perhaps missed this in the thread, but is it really so useful to bind
the subprojects to specific commits? If you care about reproducing
specific configuration, all you have to do is tag and seek recursively -
and even having a separate tiny git branch tracking just a single file
listing the commit ids of subprojects seems more elegant to me than just
forcing the specific commit ids. In the general case, I think it most
usually goes "this project#branch, the latest commit you can get", so
I'm not really convinced that you are optimizing for the right case at
all.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17 14:09                     ` Petr Baudis
@ 2006-01-17 16:45                       ` Daniel Barkalow
  2006-01-17 17:33                         ` Craig Schlenter
  2006-01-17 17:38                         ` Linus Torvalds
  0 siblings, 2 replies; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-17 16:45 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Josef Weidendorfer, git

On Tue, 17 Jan 2006, Petr Baudis wrote:

> I perhaps missed this in the thread, but is it really so useful to bind
> the subprojects to specific commits? If you care about reproducing
> specific configuration, all you have to do is tag and seek recursively -
> and even having a separate tiny git branch tracking just a single file
> listing the commit ids of subprojects seems more elegant to me than just
> forcing the specific commit ids. In the general case, I think it most
> usually goes "this project#branch, the latest commit you can get", so
> I'm not really convinced that you are optimizing for the right case at
> all.

Think from a debugging standpoint. You know that the main project worked 
with a particular commit of the superproject. The bug you've found is 
related to the behavior of one of the subprojects in the the context of 
your superproject, but you don't know this. In order to reproduce the 
working version and search for the change that broken things, you need to 
be able to identify which commits of subprojects were used in each commit 
of the superproject; these are almost certainly not the latest commits on 
any branch of the subproject. And if you're going to want to debug things 
later, no commit of the superproject can just say to use the latest in the 
subproject. You don't know what you make a commit whether it will turn out 
to be a configuration that you'll want to recreate later.

Now it may be useful to have a tool to update all of the subprojects to 
the latest versions, similar in end-user usage to pulling repositories, 
but you still need to generate new commits when you do this, rather than 
reinterpreting the old commits with new content, so that you keep the 
history immutable. You also want to know when you've done this, so that 
you don't clone a tree that's working fine and build it only to find that 
the clone has fetched a new kernel version with different behavior without 
letting you know that anything has changed.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17 16:45                       ` Daniel Barkalow
@ 2006-01-17 17:33                         ` Craig Schlenter
  2006-01-17 17:38                         ` Linus Torvalds
  1 sibling, 0 replies; 56+ messages in thread
From: Craig Schlenter @ 2006-01-17 17:33 UTC (permalink / raw)
  To: git

Hi

For reference, here is what subversion does:

http://svnbook.red-bean.com/nightly/en/svn.advanced.externals.html

--Craig

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17 16:45                       ` Daniel Barkalow
  2006-01-17 17:33                         ` Craig Schlenter
@ 2006-01-17 17:38                         ` Linus Torvalds
  1 sibling, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2006-01-17 17:38 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, Junio C Hamano, Josef Weidendorfer, git

On Tue, 17 Jan 2006, Daniel Barkalow wrote:
> 
> Think from a debugging standpoint. You know that the main project worked 
> with a particular commit of the superproject.

Yes, there are real advantages to being able to tag a very specific 
version of a tree.

You can do it manually (ie tag the versions of everything used), but 
there's a real convenience to being able to say "I want the tree to look 
exactly as it looked for our internal test-release that we shipped as a 
pre-view to customer so-and-so".

You can do it with ad-hoc build rules inside a company, but the likelihood 
that they don't work all the time is pretty high. Somebody forgot to 
follow the right procedure, and had updated a sub-tree without marking it, 
and now you can't reproduce the problem that a customer has with a debug 
build, because you have no way to reproduce the exact binary...

It's why people tag every file for huge trees under CVS for a release, and 
accept why building a release may take hours. It's crazy, yes, but there 
are other projects than just the BSD's that have that "World" mentality, 
where they want every single program under _one_ umbrella, so that they 
can tag them all together.

Me, I think it's crazy engineering ("if you can't reproduce it with 
individual projects, you're not doing programming, you're doing Voodoo"), 
but it's something that some organizations simply require.

Now, it might be enough with a cogito approach of ".git/subprojects", and 
just _version-control_ it in the top-level project, but then you'd need to 
make sure that all the tools automatically update the version when they do 
a "pull" or a "commit" on a subproject. But then it almost boils down to 
"gitlink"s after all.

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17  6:18                   ` Junio C Hamano
  2006-01-17 14:09                     ` Petr Baudis
@ 2006-01-17 17:41                     ` Daniel Barkalow
  2006-01-18  1:41                       ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-17 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, git

On Mon, 16 Jan 2006, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > So why not use the "bind" approach for the "index vs working tree" part, 
> > but write out "gitlink"-style tree objects?
> 
> I said "index vs working tree" as a mere example, and never said
> "gitlink" is easier (or at least as easy as "bind") for "tree
> object vs index" or "tree object vs working tree through index".
> In fact I suspect those parts also need to be changed fairly
> heavily, and to be honest, I am not very much looking forward to
> investigating the details.

I suspect that it's just as easy, except that you get confronted 
immediately with the issues that you haven't dealt with in the bind 
approach (mentioned below). If you had parse_tree_buffer() just ignore 
them, and had write-tree take a list of bind lines, that would match the 
status of your "bind" implementation, I think (except for the part you say 
is cheating).

Incidentally, I don't think we'd want "gitlink" objects with the "gitlink" 
approach; we'd want trees to contain commit objects for subprojects. The 
"gitlink" thing that corresponds to ".git/HEAD" isn't an object, it's a 
tree entry, which, like ".git/HEAD" (or, more appropriately, 
".git/refs/heads/something") maps a name to the hash of a commit object.

> > In any case, I think it would be good to track where the subprojects are 
> > in some core state, and probably the right solution is to have special 
> > index entries for them, in addition to having their contents in the index. 
> 
> Actually, the "special entry" was what I found out to be quite a
> pain, if you mean to have "linux-2.6/" in the index and have it
> used in some meaningful way.  Further hacking and prototyping
> _might_ convince me otherwise, but I am not so optimistic at
> this moment.

Hmm... maybe libification should go ahead of subprojects. If access to the 
index weren't so often open-coded, it would just be a matter of having 
these entries in the data structure, but not actually returned by any 
current call, and it would be just like they were in some other structure. 

Actually, it should be easy to have them in the index file but not in the 
main index data structure, by skipping over them in the for loop near the 
end of read_cache(). Put them in a separate structure, and write them back 
to the file in write_cache(), and have a different method entirely for 
changing them, and they shouldn't affect the normal use of the index.

> > I'm not seeing a clear way to get from commit objects with "bind" lines to 
> > an index with the appropriate things read and back otherwise.
> 
> Here again I am thinking aloud, remembering the earlier example
> of an embedded linux project that ships with linux-2.6 and
> gcc-4.0, along with its own README and Makefile at the toplevel
> and src/ for its own sources.  The tools at the tip of "pu"
> should be able to let you do the following:
> 
> 	$ git cat-file commit $such_toplevel_commit
> 	tree $tree
>         parent $parent
>         bind $primarysub /
>         bind $linuxsub linux-2.6/
>         bind $gccsub gcc-4.0/
> 	author A U Thor <author@example.com> 1137392543 -0800
> 	commmitter A U Thor <author@example.com> 1137392543 -0800
> 
>         An example.
> 
> where $tree is the object name of the whole tree (no "gitlink"
> object), $primarysub and $linuxsub are the object names of
> commit objects for the primary subproject (which sits at the
> rootlevel) and another subproject (which sits at linux-2.6/
> subdirectory).
> 
> To make sure there is no misunderstanding:
> 
> 	* "git-ls-tree $tree" would show the object name of
>           $linuxsub^{tree} at path "linux-2.6/" because
>           "tree" line of a commit describes the whole tree,
>           including subprojects.
> 
> 	* "git-ls-tree $primarysub" would show README,
>           Makefile and src/ directories but not linux-2.6/ nor
>           gcc-4.0/.
> 
> 	* "git-ls-tree $linuxsub" would show COPYING, Makefile
>           etc., not linux-2.6/COPYING.

Side issue here: this implies that the kernel objects are in the 
superproject's repository, or at least accessible from it. So prune has to 
not remove them. So, if you've committed changes to a subproject but not 
yet committed the fact that you want to use the changed subproject into 
the superproject, fsck-objects has to find them somewhere.

> Reading such a commit is easy:
> 
> 	$ git-read-tree $tree ;# ;-)
> 
> But that is cheating.  

This is for backwards compatibility, I assume?

> Constructing such an index can be done by:
>
> 	$ git-read-tree $primarysub
>         $ git-read-tree --prefix=linux-2.6/ $linuxsub
>         $ git-read-tree --prefix=gcc-4.0/ $gccsub
> 
> When you have such an index, writing out various trees are:
> 
> 	$ git-write-tree ;# $tree
> 	$ git-write-tree --prefix=linux-2.6/ ;# $linuxsub^{tree}
> 	$ git-write-tree --prefix=gcc-4.0/ ;# $gccsub^{tree}
> 	$ git-write-tree \
>           --bound=linux-2.6/ --bound=gcc-4.0/ ;# $primarysub^{tree}
> 
> The decision to use what --prefix and --bound and what tree(s)
> to write out must come from somewhere, and as you say it would
> be nice if we _could_ stick them in the index as "special
> entries", but for the purpose of prototyping I am assuming I
> keep that somewhere in $GIT_DIR/ (the "mtab" in the previous
> message.  Maybe "$GIT_DIR/bind" is a good name?).

The hard thing here is getting the commits for the trees. The bind lines 
need commits, which means either identifying that we already have the 
correct commit object, because we didn't change anything in the 
subproject, or generating a new commit object with some message and the 
right parent. And we want to use commit objects, not tree objects, in the 
bind lines, so that, once we track a problem to the change of which commit 
is bound, we can treat the subproject as a project and debug it with 
bisect, rather than just having one tree that works and one that doesn't.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-17 17:41                     ` Daniel Barkalow
@ 2006-01-18  1:41                       ` Junio C Hamano
  2006-01-18  3:49                         ` Junio C Hamano
  2006-01-18 18:21                         ` Daniel Barkalow
  0 siblings, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-18  1:41 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Josef Weidendorfer, git

Daniel Barkalow <barkalow@iabervon.org> writes:

> Incidentally, I don't think we'd want "gitlink" objects with the "gitlink" 
> approach; we'd want trees to contain commit objects for subprojects. The 
> "gitlink" thing that corresponds to ".git/HEAD" isn't an object, it's a 
> tree entry, which, like ".git/HEAD" (or, more appropriately, 
> ".git/refs/heads/something") maps a name to the hash of a commit object.

> Hmm... maybe libification should go ahead of subprojects. If access to the 
> index weren't so often open-coded, it would just be a matter of having 
> these entries in the data structure, but not actually returned by any 
> current call, and it would be just like they were in some other structure. 

And libification has been waiting for the core to settle ;-) We
have to start somewhere.

> Actually, it should be easy to have them in the index file but not in the 
> main index data structure, by skipping over them in the for loop near the 
> end of read_cache()....

Yeah, I guess I was vaguely thinking along those lines while I
was driving to work this morning.  I appreciate your spelling it
out to make things clearer.

> Side issue here: this implies that the kernel objects are in the 
> superproject's repository, or at least accessible from it. So prune has to 
> not remove them. So, if you've committed changes to a subproject but not 
> yet committed the fact that you want to use the changed subproject into 
> the superproject, fsck-objects has to find them somewhere.

Yes.  I was planning to have "$GIT_DIR/bind" that says:

	master kernel=linux-2.6/ gcc=gcc-4.0/

meaning:

	The project kept track by "master" branch binds the
	project kept track by "kernel" branch as its subproject
	at its linux-2.6/ subdirectory.

or something like that, so when you make a commit, you update
those other branches as needed.  You already raised that issue
at the end of your message, and I will explain how I think that
can/should be done as a response to that part later.

>> Reading such a commit is easy:
>> 
>> 	$ git-read-tree $tree ;# ;-)
>> 
>> But that is cheating.  
>
> This is for backwards compatibility, I assume?

This is done more for not having to touch *anything* that does
"index vs working file", "tree vs index" and "tree vs working
file via index".  It also is the easiest way to keep the "a
commit object name can be used in place of the tree object name
of the tree it contains" invariant.  Also I suspect this
organization might help recursive subprojects, but if it is the
case, that is just a byproduct, not a design goal.

>> When you have such an index, writing out various trees are:
>> 
>> 	$ git-write-tree ;# $tree
>> 	$ git-write-tree --prefix=linux-2.6/ ;# $linuxsub^{tree}
>> 	$ git-write-tree --prefix=gcc-4.0/ ;# $gccsub^{tree}
>> 	$ git-write-tree \
>>           --bound=linux-2.6/ --bound=gcc-4.0/ ;# $primarysub^{tree}
>
> The hard thing here is getting the commits for the trees. The bind lines 
> need commits, which means either identifying that we already have the 
> correct commit object, because we didn't change anything in the 
> subproject, or generating a new commit object with some message and the 
> right parent. And we want to use commit objects, not tree objects, in the 
> bind lines, so that, once we track a problem to the change of which commit 
> is bound, we can treat the subproject as a project and debug it with 
> bisect, rather than just having one tree that works and one that doesn't.

Your wording "get the commit" is a bit misleading.  Even when
the tree for a subproject happens to match a commit in the
subproject in a distant past, we would not want to use it unless
the user explicitly asked for it.  IOW, we do not actively go
and look for a commit.

Our subproject tree either matches the subproject branch head,
in which case we just reuse it, or we make a new commit on top
of that ourselves.

Let's say my project breaks with the latest kernel, and I
suspect that it would work with v2.6.13 sources.  To test that
theory, I could:

        $ git branch -f kernel v2.6.13 ;# rewind

	$ git ls-files linux-2.6/ |
          xargs git update-index --force-remove
        $ git read-tree --prefix=linux-2.6/ -u kernel

to construct such a tree.  Maybe the latter two-command sequence
"ls-files & read-tree --prefix" sequence deserves to become a
command, "git update-subproject kernel" [*1*].

The result may work as-is, or I may need to do some further
futzing in linux-2.6/ directory before the result works.  Once
the result starts working, I'd want to make a commit:

 - I compare the result of write-tree for linux-2.6/ portion and
   the tree object name contained in the head commit of the
   "kernel" branch.  If they match, then the current "kernel"
   branch head commit is what I'll place on the "bind" line in
   my commit; I do not have to make a new commit in the "kernel"
   subproject in this case.

 - If the tree object does not match the "kernel" head, that
   means I have tweaked the kernel part further, on top of
   v2.6.13.  So I make a commit for the kernel subproject (whose
   parent is obviously v2.6.13), update the kernel branch head
   with that commit, and then record that tip-of-the-tree commit
   for the subproject on the "bind" line in my commit for the
   toplevel.

Or let's say my project builds with the latest kernel (IOW, I
did not do the branch -f kernel in the above), and I made some
custom tweaks in the kernel area.  The above precedure would
result in a new commit on top of the latest kernel, update the
"kernel" branch head, and make a commit for the toplevel that
records the updated "kernel" branch head on its "bind" line.

Note that the above procedure did not use the commit object name
recorded on the "bind" line at all in either case.  From the
mechanism point of view, it is the right thing to do.  From the
usability point of view, however, we may want to take notice
that "bind" line commit and the bound branch head do not match,
and remind/warn the user about it.  If the reason why they are
different is because the user rewound the bound branch to use a
known working version, or made fixes in the subproject and
pulled the result into the bound branch (in which case there is
no funny rewinding involved), then this warning is
extraneous. But in the normal case of keep reusing the same
vintage of subprojects (and maybe making necessary adjustments
to subprojects while working on the main project), the commit
object on the "bind" line of the HEAD commit and bound branch
head should match.

[Footnote]

*1* One could also do a forward development on the kernel branch
in a separate working tree and fetch from there.  For example,
if our example "superproject" is in embed/ directory, and there
is a linux/ directory next to it to house a kernel repository,
we could:

        $ cd ../linux/
        $ edit && compile && test 
        $ git commit -m 'Fix for upstream, not just for embed'

to make an upstream fix, and then:

        $ cd ../embed/
        $ git fetch ../linux/ master:kernel

to update the "kernel" subproject branch head.  In such a case:

	$ git update-subproject kernel

would bring the subproject working tree and index up to date
with respect to the updated kernel branch.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18  1:41                       ` Junio C Hamano
@ 2006-01-18  3:49                         ` Junio C Hamano
  2006-01-18 11:47                           ` Alexander Litvinov
  2006-01-18 18:21                         ` Daniel Barkalow
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2006-01-18  3:49 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Josef Weidendorfer, git

Junio C Hamano <junkio@cox.net> writes:

> Daniel Barkalow <barkalow@iabervon.org> writes:
>
>>> Reading such a commit is easy:
>>> 
>>> 	$ git-read-tree $tree ;# ;-)
>>> 
>>> But that is cheating.  
>>
>> This is for backwards compatibility, I assume?
>
> This is done more for not having to touch *anything* that does
> "index vs working file", "tree vs index" and "tree vs working
> file via index".  It also is the easiest way to keep the "a
> commit object name can be used in place of the tree object name
> of the tree it contains" invariant.  Also I suspect this
> organization might help recursive subprojects, but if it is the
> case, that is just a byproduct, not a design goal.

I started this "bind" design as a thought experiment, but I
started to like it more and more.

One interesting outcome of keeping the whole tree in the index
and the tree object recorded in the commit object of the
toplevel project is that a merge in the toplevel project "just
works".

To preserve our sanity, let's say we refuse to merge two commits
that have different sets of subprojects.  That is, they must
have the "bind" lines for the same set of subdirectories.  The
commits bound at these subdirectories do not need to match.

Before starting a merge, we require that the index is in sync
with the tree object recorded in the top commit, just like we do
for a normal merge[*1*].  Then we use the current merge
machinery that does not know anything about "bind" to perform
the merge, using the merge base of the toplevel project and
usual three-way merge.  From the mechanism point of view, there
is no need to look at commits on "bind" line of either side to
come up with the resulting tree.

We could notice that the commit bound at linux-2.6/ subdirectory
of one side is v2.6.15 and the other side is v2.6.16-rc1, and
because one is a fast-forward of the other, choose to pick the
tree associated with v2.6.16-rc1 commit without actually doing
the 3-way resolve of linux-2.6/ subtree part, but that is purely
a performance optimization [*2*].

When writing out the merge result as a commit, we would create
(this is the fun part) a commit for linux-2.6/ part that has two
parents: the commits bound to linux-2.6/ tree from the two
toplevel commits being merged are the parents of such a
subproject commit.  And the resulting toplevel merge commit
would have that commit object name on its "bind" line.
Obviously, when the bound subproject head of one side is a
fast-forwad of the other, we do not create such a merge commit
for the subproject; instead, we just record the one that is
ahead on the "bind" line of the resulting toplevel merge commit.

[Footnote]

*1* As a side effect, this also ensures the index is in sync
with the bound commits of the subprojects.  As an additional
requirement, we may want to enforce that the bound commits must
match the branch heads that keep track of subprojects.

*2* Of course, from the usability, safety and confusion
avoidance point of view, it _might_ make sense to require that
bound commits are in such fast-forward relationship.  But that
is a policy issue; at the mechanism level, there is no need to
impose such a requirement.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18  3:49                         ` Junio C Hamano
@ 2006-01-18 11:47                           ` Alexander Litvinov
  2006-01-18 13:29                             ` Andreas Ericsson
  2006-01-18 17:06                             ` Junio C Hamano
  0 siblings, 2 replies; 56+ messages in thread
From: Alexander Litvinov @ 2006-01-18 11:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> I started this "bind" design as a thought experiment, but I
> started to like it more and more.
>

Is there a version of git with this to try it ?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18 11:47                           ` Alexander Litvinov
@ 2006-01-18 13:29                             ` Andreas Ericsson
  2006-01-18 17:06                             ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Andreas Ericsson @ 2006-01-18 13:29 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git

Alexander Litvinov wrote:
>>I started this "bind" design as a thought experiment, but I
>>started to like it more and more.
>>
> 
> 
> Is there a version of git with this to try it ?

The pu branch of the official git repo.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18 11:47                           ` Alexander Litvinov
  2006-01-18 13:29                             ` Andreas Ericsson
@ 2006-01-18 17:06                             ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-01-18 17:06 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: git

Alexander Litvinov <lan@ac-sw.com> writes:

>> I started this "bind" design as a thought experiment, but I
>> started to like it more and more.
>>
>
> Is there a version of git with this to try it ?

	http://article.gmane.org/gmane.comp.version-control.git/14760

As I warned there, it still has quite rough edges, so do not use
it on your production repositories.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18  1:41                       ` Junio C Hamano
  2006-01-18  3:49                         ` Junio C Hamano
@ 2006-01-18 18:21                         ` Daniel Barkalow
  2006-01-18 18:49                           ` Junio C Hamano
  2006-01-23  1:22                           ` Petr Baudis
  1 sibling, 2 replies; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-18 18:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, git

On Tue, 17 Jan 2006, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > Incidentally, I don't think we'd want "gitlink" objects with the "gitlink" 
> > approach; we'd want trees to contain commit objects for subprojects. The 
> > "gitlink" thing that corresponds to ".git/HEAD" isn't an object, it's a 
> > tree entry, which, like ".git/HEAD" (or, more appropriately, 
> > ".git/refs/heads/something") maps a name to the hash of a commit object.
> 
> > Hmm... maybe libification should go ahead of subprojects. If access to the 
> > index weren't so often open-coded, it would just be a matter of having 
> > these entries in the data structure, but not actually returned by any 
> > current call, and it would be just like they were in some other structure. 
> 
> And libification has been waiting for the core to settle ;-) We
> have to start somewhere.

Well, we could do a pass at cleaning, organizing, and documenting the 
internals, which is sort of the start to each of them.

> > Side issue here: this implies that the kernel objects are in the 
> > superproject's repository, or at least accessible from it. So prune has to 
> > not remove them. So, if you've committed changes to a subproject but not 
> > yet committed the fact that you want to use the changed subproject into 
> > the superproject, fsck-objects has to find them somewhere.
> 
> Yes.  I was planning to have "$GIT_DIR/bind" that says:
> 
> 	master kernel=linux-2.6/ gcc=gcc-4.0/
> 
> meaning:
> 
> 	The project kept track by "master" branch binds the
> 	project kept track by "kernel" branch as its subproject
> 	at its linux-2.6/ subdirectory.
> 
> or something like that, so when you make a commit, you update
> those other branches as needed.  You already raised that issue
> at the end of your message, and I will explain how I think that
> can/should be done as a response to that part later.

Okay, so you're using additional branch heads in the superproject to track 
the current state of the subprojects. That makes sense, although I think 
it would confuse people less if they were held separately. IIRC, 
refs/subprojects/kernel/heads/master is a perfectly good ref name these 
days, so that might be a good idea. That would also mean that 
refs/tags/v2.6.14 and refs/tags/v2.7.2.3 wouldn't get confused (being 
linux and gcc tags, respectively), because they'd be under the appropriate 
subprojects.

I assume these get updated by checkout when you check out the commit with 
them as bind lines?

> >> Reading such a commit is easy:
> >> 
> >> 	$ git-read-tree $tree ;# ;-)
> >> 
> >> But that is cheating.  
> >
> > This is for backwards compatibility, I assume?
> 
> This is done more for not having to touch *anything* that does
> "index vs working file", "tree vs index" and "tree vs working
> file via index".  It also is the easiest way to keep the "a
> commit object name can be used in place of the tree object name
> of the tree it contains" invariant.  Also I suspect this
> organization might help recursive subprojects, but if it is the
> case, that is just a byproduct, not a design goal.

Ah, okay, so it's cheating for checkout, because checkout is supposed to 
understand everything, but not cheating for other things. I thought we 
decided that the stuff that doesn't know about subprojects sees them as 
opaque, rather than as their contents, so your toplevel git diff doesn't 
show you a millions lines when you switch from linux-2.6.14 to 15.

> >> When you have such an index, writing out various trees are:
> >> 
> >> 	$ git-write-tree ;# $tree
> >> 	$ git-write-tree --prefix=linux-2.6/ ;# $linuxsub^{tree}
> >> 	$ git-write-tree --prefix=gcc-4.0/ ;# $gccsub^{tree}
> >> 	$ git-write-tree \
> >>           --bound=linux-2.6/ --bound=gcc-4.0/ ;# $primarysub^{tree}
> >
> > The hard thing here is getting the commits for the trees. The bind lines 
> > need commits, which means either identifying that we already have the 
> > correct commit object, because we didn't change anything in the 
> > subproject, or generating a new commit object with some message and the 
> > right parent. And we want to use commit objects, not tree objects, in the 
> > bind lines, so that, once we track a problem to the change of which commit 
> > is bound, we can treat the subproject as a project and debug it with 
> > bisect, rather than just having one tree that works and one that doesn't.
> 
> Your wording "get the commit" is a bit misleading.  Even when
> the tree for a subproject happens to match a commit in the
> subproject in a distant past, we would not want to use it unless
> the user explicitly asked for it.  IOW, we do not actively go
> and look for a commit.

We don't search the history for just any commit, but we have to look 
somewhere...

> Our subproject tree either matches the subproject branch head,
> in which case we just reuse it, or we make a new commit on top
> of that ourselves.

I hadn't realized that the subprojects had branch heads. It makes much 
more sense that you'd expect to be able to just write out bind lines if 
you've got that information.

I thought we decided that committing the superproject wouldn't commit the 
subprojects. If this is what we decided, then the subproject tree is 
required to match the branch head, because we must have committed the 
subproject already (which is good, because otherwise the user will get 
confused about which commit message to write when).

> Let's say my project breaks with the latest kernel, and I
> suspect that it would work with v2.6.13 sources.  To test that
> theory, I could:
> 
>         $ git branch -f kernel v2.6.13 ;# rewind
> 
> 	$ git ls-files linux-2.6/ |
>           xargs git update-index --force-remove
>         $ git read-tree --prefix=linux-2.6/ -u kernel
> 
> to construct such a tree.  Maybe the latter two-command sequence
> "ls-files & read-tree --prefix" sequence deserves to become a
> command, "git update-subproject kernel" [*1*].

Shouldn't "git read-tree --prefix=linux-2.6/ -u kernel" remove everything 
else in the index in linux-2.6 itself, making the "git update-index 
--force-remove" unnecessary?

> The result may work as-is, or I may need to do some further
> futzing in linux-2.6/ directory before the result works.  Once
> the result starts working, I'd want to make a commit:
> 
>  - I compare the result of write-tree for linux-2.6/ portion and
>    the tree object name contained in the head commit of the
>    "kernel" branch.  If they match, then the current "kernel"
>    branch head commit is what I'll place on the "bind" line in
>    my commit; I do not have to make a new commit in the "kernel"
>    subproject in this case.
> 
>  - If the tree object does not match the "kernel" head, that
>    means I have tweaked the kernel part further, on top of
>    v2.6.13.  So I make a commit for the kernel subproject (whose
>    parent is obviously v2.6.13), update the kernel branch head
>    with that commit, and then record that tip-of-the-tree commit
>    for the subproject on the "bind" line in my commit for the
>    toplevel.

Equivalently, you make a commit for the kernel subproject, update the 
branch head, and start over; the first case should apply now.

> Or let's say my project builds with the latest kernel (IOW, I
> did not do the branch -f kernel in the above), and I made some
> custom tweaks in the kernel area.  The above precedure would
> result in a new commit on top of the latest kernel, update the
> "kernel" branch head, and make a commit for the toplevel that
> records the updated "kernel" branch head on its "bind" line.
> 
> Note that the above procedure did not use the commit object name
> recorded on the "bind" line at all in either case.  From the
> mechanism point of view, it is the right thing to do.  From the
> usability point of view, however, we may want to take notice
> that "bind" line commit and the bound branch head do not match,
> and remind/warn the user about it.  If the reason why they are
> different is because the user rewound the bound branch to use a
> known working version, or made fixes in the subproject and
> pulled the result into the bound branch (in which case there is
> no funny rewinding involved), then this warning is
> extraneous. But in the normal case of keep reusing the same
> vintage of subprojects (and maybe making necessary adjustments
> to subprojects while working on the main project), the commit
> object on the "bind" line of the HEAD commit and bound branch
> head should match.

I hope people will want to prepare their commits to the kernel subproject 
as would be suitable for pushing to Linus, which would suggest that they'd 
tend to do a commit in the kernel subproject embedded in their 
superproject separately from doing the commit in the superproject, and 
so the branch head would match the index but not the bind line when they 
got to committing the superproject.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18 18:21                         ` Daniel Barkalow
@ 2006-01-18 18:49                           ` Junio C Hamano
  2006-01-18 19:29                             ` Daniel Barkalow
  2006-01-23  1:22                           ` Petr Baudis
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2006-01-18 18:49 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> I assume these get updated by checkout when you check out the commit with 
> them as bind lines?

I did not think of that when I wrote that message, but you are right.

> Ah, okay, so it's cheating for checkout, because checkout is supposed to 
> understand everything, but not cheating for other things.

Yeah; to put it another way, read-tree of the toplevel tree and
checking it out is equivalent to do the skelton read-tree
followed by --prefix read-tree of all the bound projects, so
checkout can optimize.

> ... I thought we 
> decided that the stuff that doesn't know about subprojects sees them as 
> opaque, rather than as their contents, so your toplevel git diff doesn't 
> show you a millions lines when you switch from linux-2.6.14 to 15.

It was discussed in the context of "gitlink" approach as a way
to keep things simple.  In the "bind" approach, I am doing
things a bit differently, and this "toplevel has everything" is
one big difference.

> I thought we decided that committing the superproject wouldn't
> commit the subprojects.

I see it as a policy.  We can forbid the modification of the
subproject part of the index (i.e. detect and refuse to commit
and/or do "git reset --mixed" only for the subproject part) so
that the commit outlined in the "bind" approach does not _have_
to make a new commit, if you want to work that way.  But if
somebody else wants to make a related set of changes to the
superproject and bound subprojects, we _could_ allow a commit
per subproject.

> Shouldn't "git read-tree --prefix=linux-2.6/ -u kernel" remove everything 
> else in the index in linux-2.6 itself, making the "git update-index 
> --force-remove" unnecessary?

I agree that "-u" should imply that.  The current "read-tree
--prefix=linux-2.6/" in proposed updates refuses if linux-2.6/
appears in the original index.

> I hope people will want to prepare their commits to the kernel subproject 
> as would be suitable for pushing to Linus, which would suggest that they'd 
> tend to do a commit in the kernel subproject embedded in their 
> superproject separately from doing the commit in the superproject, and 
> so the branch head would match the index but not the bind line when they 
> got to committing the superproject.

Yes, that is the workflow I outlined in the footnote part you
did not quote.  I think it is cleaner to do things that way: to
have a separate, kernel-only repository+worktree and do pure
kernel work there, and fetch into the superproject branch that
keeps track of the kernel subproject in that superproject.

Having more than one working tree with .git/, everything except
HEAD and index undef which are symlinked to one copy, like you
do, would be a natural way to work.

	embed/.git/HEAD -> refs/heads/master

	embed/linux-2.6/.git/HEAD -> refs/heads/kernel
	embed/linux-2.6/.git/refs -> ../.git/refs
	embed/linux-2.6/.git/objects -> ../.git/objects

Then, after hacking on the collective whole to make the whole
thing work in "embed" directory, you would:

	$ cd linux-2.6
        $ git commit

to make commit that can be sent Linus, at the same time updating
the "kernel" branch.  Then come back to the toplevel, tell git
that you updated the "kernel" branch so it does not complain
that the "bind" in the HEAD commit does not match "kernel" head,
and make a toplevel commit.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18 18:49                           ` Junio C Hamano
@ 2006-01-18 19:29                             ` Daniel Barkalow
  0 siblings, 0 replies; 56+ messages in thread
From: Daniel Barkalow @ 2006-01-18 19:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, 18 Jan 2006, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > ... I thought we 
> > decided that the stuff that doesn't know about subprojects sees them as 
> > opaque, rather than as their contents, so your toplevel git diff doesn't 
> > show you a millions lines when you switch from linux-2.6.14 to 15.
> 
> It was discussed in the context of "gitlink" approach as a way
> to keep things simple.  In the "bind" approach, I am doing
> things a bit differently, and this "toplevel has everything" is
> one big difference.

I thought that had been a question of what is best as an interface, but 
either is plausible.

> > I thought we decided that committing the superproject wouldn't
> > commit the subprojects.
> 
> I see it as a policy.  We can forbid the modification of the
> subproject part of the index (i.e. detect and refuse to commit
> and/or do "git reset --mixed" only for the subproject part) so
> that the commit outlined in the "bind" approach does not _have_
> to make a new commit, if you want to work that way.  But if
> somebody else wants to make a related set of changes to the
> superproject and bound subprojects, we _could_ allow a commit
> per subproject.

I think it makes most sense, for the purpose of consolidating code paths, 
if the superproject may only be committed with clean subprojects; the 
porcelain has the option of responding to unclean subprojects by 
committing them to make them clean, and then there is only a single case 
for how the superproject commit happens. I think it makes most sense as a 
command line option, like -a is; if you want to commit dirty suprojects, 
you use --subprojects, and it does that. If you're not expecting to need 
it, you won't start doing the wrong commit.

> > I hope people will want to prepare their commits to the kernel subproject 
> > as would be suitable for pushing to Linus, which would suggest that they'd 
> > tend to do a commit in the kernel subproject embedded in their 
> > superproject separately from doing the commit in the superproject, and 
> > so the branch head would match the index but not the bind line when they 
> > got to committing the superproject.
> 
> Yes, that is the workflow I outlined in the footnote part you
> did not quote.  I think it is cleaner to do things that way: to
> have a separate, kernel-only repository+worktree and do pure
> kernel work there, and fetch into the superproject branch that
> keeps track of the kernel subproject in that superproject.

I actually meant that I expected people to go into superproject/linux-2.6, 
make changes, and commit there, using the place it appears in their 
superproject working tree as a working tree for the subproject, so the 
opposite of your footnote, but still doing the subproject commit as a step 
before the superproject commit.

For example, they might want to send the subproject changes upstream as a 
patch, get feedback, reset the subproject, do revised changes, commit 
that, get it merged upstream, and then commit the changes to the 
superproject, including in the message the fact that the changes have been 
pushed upstream. But they may still want to do this all within the working 
tree of the superproject, so that they can test their changes in context.

> Having more than one working tree with .git/, everything except
> HEAD and index undef which are symlinked to one copy, like you
> do, would be a natural way to work.
> 
> 	embed/.git/HEAD -> refs/heads/master
> 
> 	embed/linux-2.6/.git/HEAD -> refs/heads/kernel
> 	embed/linux-2.6/.git/refs -> ../.git/refs
> 	embed/linux-2.6/.git/objects -> ../.git/objects
> 
> Then, after hacking on the collective whole to make the whole
> thing work in "embed" directory, you would:
> 
> 	$ cd linux-2.6
>         $ git commit
> 
> to make commit that can be sent Linus, at the same time updating
> the "kernel" branch.  Then come back to the toplevel, tell git
> that you updated the "kernel" branch so it does not complain
> that the "bind" in the HEAD commit does not match "kernel" head,
> and make a toplevel commit.

I'm not sure having a .git directory for a subproject inside a 
subdirectory of the superproejct's working tree is all that good an idea, 
and I don't think it should be necessary in any case, because the toplevel 
index has all the information from the subproject index. The only think 
would be having "git commit" notice what you're doing when you run it from 
a directory that's a subproject.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-16 20:49               ` Junio C Hamano
  2006-01-17  5:46                 ` Daniel Barkalow
@ 2006-01-23  0:50                 ` Petr Baudis
  1 sibling, 0 replies; 56+ messages in thread
From: Petr Baudis @ 2006-01-23  0:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, git

Dear diary, on Mon, Jan 16, 2006 at 09:49:48PM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
>    We could introduce "bind the rest" to make write-tree write
>    out a tree that contains only the containing project part and
>    not any of the subproject part (e.g. Makefile, README and
>    src/ but not linux-2.6/ nor gcc-4.0/ in the earlier example).
>    Essentially the contents of such a tree object would be the
>    same as what "gitlink" approach would have had for the
>    containing project in the index file, minus "gitlink" entries
>    themselves).  This is not so surprising, because the missing
>    information "gitlink" approach recorded in the tree object
>    itself is expressed on "bind" lines in the commit object with
>    this approach.

Now, I must have missed the obvious again, but what is the point in
having the write-tree --exclude stuff? My impression (also from your
later mail in this thread) is that now the moment you introduce any
binds, your top-level development changes to "two-tiered" - the
top-level project and the meta-project holding it all together. I'd say
that's pretty confusing and I don't see big gain in this; the simplicity
of the original proposal was a lot more appealing.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-18 18:21                         ` Daniel Barkalow
  2006-01-18 18:49                           ` Junio C Hamano
@ 2006-01-23  1:22                           ` Petr Baudis
  1 sibling, 0 replies; 56+ messages in thread
From: Petr Baudis @ 2006-01-23  1:22 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Josef Weidendorfer, git

Dear diary, on Wed, Jan 18, 2006 at 07:21:59PM CET, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> said that...
> Okay, so you're using additional branch heads in the superproject to track 
> the current state of the subprojects. That makes sense, although I think 
> it would confuse people less if they were held separately. IIRC, 
> refs/subprojects/kernel/heads/master is a perfectly good ref name these 
> days, so that might be a good idea. That would also mean that 
> refs/tags/v2.6.14 and refs/tags/v2.7.2.3 wouldn't get confused (being 
> linux and gcc tags, respectively), because they'd be under the appropriate 
> subprojects.

I passionately agree - this is the only thing I do not like on the
current Junio's proposal (besides that top-level subproject confusion).
The way it is proposed, you are mixing different projects in a single
refs namespace and I think that's *really* confusing.

Besides, you are going to get a lot of complications since to do merging
properly you need two heads per subproject (its 'master' and 'origin'
heads; and it's useful to have e.g. all the upstream heads called
'origin' since then you can say cg-fetch -r origin in the superproject
and have all the subproject origins fetched as well) and you might want
to have other subproject heads as well. Now, for different superproject
heads, you want separate set of subproject heads. You can see the
downward spiral from here, I guess... And multiply all that by two since
you also have tags.

It actually took me a short while to realize that keeping separate
subproject/.git/refs makes no sense precisely because for different
superproject heads, you want a different set of subproject refs.
So in line with Daniel's proposal, I'd propose:

	refs/subprojects/<superhead>/<subid>/heads/master

<superhead> is the name of the current HEAD (${#refs/heads/}). <subid>
is a little more tricky - this should be the part after the equal sign
in .git/mtab (or .git/binds or .git/subprojects or whichever is the name
of the day). Obviously, you can just figure out something, but I'd like
to assign this automagically.

OTOH, in Cogito I might as well just default to sha1 of something random
(e.g.  the path+commitid+time()) since I do not expect this to be
normally referenced by a human; I just intend to switch from refs/ to
refs/subprojects/<superhead>/<subid>/ when dealing with the subproject
exclusively. ($GIT_REF_DIR (by default $GIT_DIR/refs) would come useful;
I'll probably whip up a patch when I get to finally need it.)

> I hope people will want to prepare their commits to the kernel subproject 
> as would be suitable for pushing to Linus, which would suggest that they'd 
> tend to do a commit in the kernel subproject embedded in their 
> superproject separately from doing the commit in the superproject, and 
> so the branch head would match the index but not the bind line when they 
> got to committing the superproject.

FWIW, my idea is that it should be "a seamless experience for the user"
(tm) to do development in a subproject of another project, and I can see
no reason why should that be hard to do in any way.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-01-14  8:59       ` Junio C Hamano
  2006-01-14 19:16         ` Linus Torvalds
  2006-01-16  7:28         ` Alexander Litvinov
@ 2006-02-20 13:16         ` Uwe Zeisberger
  2006-02-21  7:57           ` Junio C Hamano
  2 siblings, 1 reply; 56+ messages in thread
From: Uwe Zeisberger @ 2006-02-20 13:16 UTC (permalink / raw)
  To: git

Hello,

Junio C Hamano wrote:
> The "containing" project would have a handful "gitlink" objects
> among other things.  The toplevel tree object from a commit in
> such a project might look like this (mode bits 0160000 is
> S_IFDIR|S_IFLNK, which is what this thing is):
> 
>       $ git ls-tree HEAD
>         0100644 blob 012345... Makefile
>         0100644 blob 123456... README
>         0160000 link 234567... gcc-4.0
>         0160000 link 345678... linux-2.6
>         0040000 tree 456789... src
>       $ git cat-file -t 345678
>         link
>       $ git cat-file link 345678
>         commit 87530db5ec7d519c7ba334e414307c5130ae2da8
>         url git://...torvalds/linux-2.6.git/
> 
>         The upstream Linux 2.6 repository.
>       $ cd linux-2.6 && git-rev-parse --verify HEAD
>         87530db5ec7d519c7ba334e414307c5130ae2da8
> 
> URL will be used as a suggestion for people who cloned this tree
> to set up their repository.
I'd prefer to have the objects needed to get the linux-2.6 tree in the
object db of the containing project.  Then "url" is not needed, and you
could directly use the commit as value for the link.  i.e.

       $ git ls-tree HEAD
         0100644 blob 012345... Makefile
         0100644 blob 123456... README
         0160000 link 435363... gcc-4.0
         0160000 link 87530d... linux-2.6
         0040000 tree 456789... src

(You could now rename "link" to "commit", but it would break the
layout.)

Moreover I prefer the the link approach over the bind method.  The
reason is, that binds use information from the commit object to build
the wc other than the tree.  Moreover the condition that the
"containing" tree must not have an entry named linux-2.6 is handled
implicitly with links.

Please correct me if I'm wrong somewhere.  It's some time ago I read the
patches and this thread.  This mail is the result of some thoughts in my
vacation.

Best regards
Uwe


-- 
Uwe Zeisberger

http://www.google.com/search?q=1+year+divided+by+3+in+seconds

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: RFC: Subprojects
  2006-02-20 13:16         ` Uwe Zeisberger
@ 2006-02-21  7:57           ` Junio C Hamano
  0 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2006-02-21  7:57 UTC (permalink / raw)
  To: Uwe Zeisberger; +Cc: git

Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de> writes:

> I'd prefer to have the objects needed to get the linux-2.6 tree in the
> object db of the containing project.  Then "url" is not needed, and you
> could directly use the commit as value for the link.

... which is actually closer to what bind commit approach gives
you.  The tree object in a commit of the containing project has
the full tree object at path linux-2.6/.  The "bind" lines in
the commit object are just notes that tell you where those
trees happen to came from.

> ...  Moreover the condition that the
> "containing" tree must not have an entry named linux-2.6 is handled
> implicitly with links.

I had an impression that two approaches were more or less
equivalent, especially the last round of bound commit approach.
It does not let anything to exist at the bound path in the
containing project either ("read-tree --prefix" rejects it).

> Please correct me if I'm wrong somewhere.  It's some time ago I read the
> patches and this thread.  This mail is the result of some thoughts in my
> vacation.

I have to admit that I haven't thought about the issues involved
for a long time, having no great need nor desire for subprojects
myself, and especially with more generally useful stuff like
performance enhancement for pack generation to occupy me.  I am
not sure I am much more qualified to comment than you are at
this point.

The bound commit lowlevel changes have been sitting in "pu" for
about a month by now, but nobody seems to be interested enough
to start prototyping Porcelain around it.  Neither the gitlink
approach.  After seeing not much interest on the list, I was
hoping that I could retire both WIPs.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2006-02-21  7:57 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-11 15:58 RFC: Subprojects Simon Richter
2006-01-11 16:44 ` Johannes Schindelin
2006-01-11 16:52   ` Simon Richter
2006-01-11 17:42     ` Linus Torvalds
2006-01-11 19:43       ` Simon Richter
2006-01-11 20:06         ` Linus Torvalds
2006-01-14  8:59       ` Junio C Hamano
2006-01-14 19:16         ` Linus Torvalds
2006-01-14 19:32           ` A Large Angry SCM
2006-01-14 20:02             ` Linus Torvalds
2006-01-14 20:30               ` A Large Angry SCM
2006-01-14 20:38                 ` Junio C Hamano
2006-01-15  0:28                   ` Martin Langhoff
2006-01-15  0:49                     ` Junio C Hamano
2006-01-15  1:55                       ` Tom Prince
2006-01-16  5:06                     ` Daniel Barkalow
2006-01-16 19:08                       ` A Large Angry SCM
2006-01-16 20:20                         ` Daniel Barkalow
2006-01-16 22:25                           ` A Large Angry SCM
2006-01-16  7:48               ` Alex Riesen
2006-01-14 20:16           ` Junio C Hamano
2006-01-15  1:01             ` Junio C Hamano
2006-01-16 10:44             ` Josef Weidendorfer
2006-01-16 20:49               ` Junio C Hamano
2006-01-17  5:46                 ` Daniel Barkalow
2006-01-17  6:18                   ` Junio C Hamano
2006-01-17 14:09                     ` Petr Baudis
2006-01-17 16:45                       ` Daniel Barkalow
2006-01-17 17:33                         ` Craig Schlenter
2006-01-17 17:38                         ` Linus Torvalds
2006-01-17 17:41                     ` Daniel Barkalow
2006-01-18  1:41                       ` Junio C Hamano
2006-01-18  3:49                         ` Junio C Hamano
2006-01-18 11:47                           ` Alexander Litvinov
2006-01-18 13:29                             ` Andreas Ericsson
2006-01-18 17:06                             ` Junio C Hamano
2006-01-18 18:21                         ` Daniel Barkalow
2006-01-18 18:49                           ` Junio C Hamano
2006-01-18 19:29                             ` Daniel Barkalow
2006-01-23  1:22                           ` Petr Baudis
2006-01-23  0:50                 ` Petr Baudis
2006-01-16  7:28         ` Alexander Litvinov
2006-01-16 10:16           ` Andreas Ericsson
2006-02-20 13:16         ` Uwe Zeisberger
2006-02-21  7:57           ` Junio C Hamano
2006-01-12  3:19 ` Alexander Litvinov
2006-01-12  4:46   ` Martin Langhoff
2006-01-12  5:25     ` Alexander Litvinov
2006-01-12  5:39       ` Martin Langhoff
2006-01-12  8:36         ` Alexander Litvinov
2006-01-12  8:58           ` Alex Riesen
2006-01-12  7:20       ` Anand Kumria
2006-01-12 13:38     ` Daniel Barkalow
2006-01-15 15:07 ` [RFC][PATCH] Cogito support for simple subprojects Petr Baudis
2006-01-15 17:38   ` Linus Torvalds
2006-01-15 19:15   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).