[RFC] Submodules in GIT

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Submodules in GIT
@ 2006-11-20 21:51 Martin Waitz
  2006-11-20 22:16 ` Jakub Narebski
                   ` (4 more replies)
  0 siblings, 5 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-20 21:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 6400 bytes --]

I am currently working on adding submodule support to GIT.
Here I am presenting some prototyping work to show how submodules could
be implemented in GIT.

What is a submodule
-------------------

A submodule in a GIT repository which is managed by another higher-layer
parent repository.  Both the submodule and the parent repository can be
used as a normal GIT repository but there are links between them.  That
means that you can merge, push and pull both on the parent and on
individual submodule level.

The parent repository is special in that it does not only hold individual
files but also complete submodule trees, complete with their own history.
The history of the parent and the submodule are independent, but each
version of the parent is linked to exactly one version of the submodule.

The submodule is special in that it has one special branch which is tracked
by the parent.  Each time something is commited to this branch, this
automatically changes the parent, just like modifying normal files.  Each
time the parent gets updated, this submodule branch gets updated
automatically.

As a submodule has all the properties of a normal GIT repository,
it can also contain submodules itself.

Previous submodule proposal
---------------------------

My first experiment in implementing submodules was a very simple one: Store
the submodule refs in the working directory of the parent repository and
include them in the version control.

This way the most important properties of submodules are easily to obtain.
Each version of the parent stores the exact version of all submodules and
with the help of some shell scripts it is easily possible to update the
submodule when there are updates to the parent repository.

However, this easy approach has several drawbacks.  The most important one
is that the GIT core does not know about submodules, they are only built on
top of them.  For this reason it is difficult to support fsck-objects and
prune in such a setup.

In order to permanently keep all previous stages of the parent, there is
the need to also keep all versions of the submodules which have been linked
to the parent.  All operations which have to walk the entire object database
have to know all possible references of the submodule.  This is easy if the
submodule is only fast-forwarded, without switching branches and without
removing the submodule.  Under these conditions it is enough to walk the
current reference of the submodule which is stored in its parent.  But if
there are past versions of the submodule which are not reachable by the
current submodule commit then it is difficult to keep them in the database
when the submodule is handled by a standard GIT core.  This could only be
solved by creating fake references for all possible versions of the
submodule.

New approach
------------

In order to make the GIT core know about all possible versions of the
submodule it is not enough to store one reference in the parent working
directory.  The GIT core has to be changed so that it knows about all
submodule references while traversing the parent repository.  This means
that both the submodule and the parent repository have to use the same
object database.  (I already did this for my first experiment, but it was
not really neccessary at that time.)

A submodule really is part of the parent tree, so it is very natural to
add the link to the submodule commit into the GIT tree data structure.
In addition to links to blobs and other trees, they can now also hold
a link to a commit, which in turn has the pointers to the submodule tree
and its history.  In order to differenciate a submodule entry with
normal file or directory entries, they get a special file mode.

Directly including the submodules in the object database allows the
traversal of the entire repository, together with the parent and all
submodules.  In this way it is possible to support fsck-objects and prune
when they are executed in the parent repository.  However, submodules can
contain branches which are independent from the one stored in the parent.
So all references and the working directory index of the submodules have
to be made available to the parent, too.

When a parent containing submodules is checked out, the submodul entry is
stored in the index, just like it is done for all normal files.  But instead
of writing one file to the working directory, a complete GIT repository is
created (with the object database linked to the parent one).  This submodule
gets a reference "refs/heads/module" from the parent's index entry.
The index entry in the parent can be updated with update-index just
like other entries.

When a merge in the parent has to resolve changes in the submodule,
then it does exactly the same as for files: at first it is tried to resolve
it in the index and if this is not possible it will have store stages 1-3
in the index and tries a content merge.  The only difference with submodules
is that the content merge is not possible with a simple diff3 call, but
that the GIT merge machinery has to recurse into the submodule.

Implementation
--------------

Obviously, all the tree traversal routines have to be modified to recognize
a submodule and to correctly traverse it.  The submodule entry gets
a special S_IFSOCK file mode to distinguish it from other entries.
This special file mode is used for both the tree entries in the object
database and for the index entries of the submodule in the parent index.

Some basic low-level commands now (more or less) cope with submodules.
Merging of submodules has not yet been implemented.

The current status can be viewed in
http://git.admingilde.org/tali/git.git/module2
(on top of next)

The code now passes the small test-script so at least a little bit of
it must be working ;-).  Please feel free to give it a try and complain
that it does not work the way you expect it.

What's next?
------------

The most important next step is to commit to some object database format
for submodules.  So please, do give feedback about the proposed changes
to the tree object.

The second most important step is to make it possible to merge submodules.

After that, for sure there are enough bugs to fix to keep me busy for some
time... ;-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
@ 2006-11-20 22:16 ` Jakub Narebski
  2006-11-20 22:28   ` Martin Waitz
  2006-11-20 22:43   ` Junio C Hamano
  2006-11-20 22:49 ` Jakub Narebski
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-11-20 22:16 UTC (permalink / raw)
  To: git

Martin Waitz wrote:

> A submodule really is part of the parent tree, so it is very natural to
> add the link to the submodule commit into the GIT tree data structure.
> In addition to links to blobs and other trees, they can now also hold
> a link to a commit, which in turn has the pointers to the submodule tree
> and its history.  In order to differenciate a submodule entry with
> normal file or directory entries, they get a special file mode.

Erm... isn't a _type_ of tree entry saved somewhere? Currently it can
be only 'tree' or 'blob', what you do is adding 'commit' (then permissions
are permissions of top tree of module, of course).

By the way, in todo branch, in Subpro.txt, there is talk about adding
link to submodule trees in _commit object_... well link to submodule tree
or commit, with the "mount point".
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 22:16 ` Jakub Narebski
@ 2006-11-20 22:28   ` Martin Waitz
  2006-11-20 22:43   ` Junio C Hamano
  1 sibling, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-20 22:28 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

hoi :)

On Mon, Nov 20, 2006 at 11:16:45PM +0100, Jakub Narebski wrote:
> Martin Waitz wrote:
> > A submodule really is part of the parent tree, so it is very natural to
> > add the link to the submodule commit into the GIT tree data structure.
> > In addition to links to blobs and other trees, they can now also hold
> > a link to a commit, which in turn has the pointers to the submodule tree
> > and its history.  In order to differenciate a submodule entry with
> > normal file or directory entries, they get a special file mode.
>
> Erm... isn't a _type_ of tree entry saved somewhere? Currently it can
> be only 'tree' or 'blob', what you do is adding 'commit' (then permissions
> are permissions of top tree of module, of course).

It is saved inside the object which is being refered to.
Right now tree objects are also identified by their file mode and not
by the type of object which is referenced.

> By the way, in todo branch, in Subpro.txt, there is talk about adding
> link to submodule trees in _commit object_... well link to submodule tree
> or commit, with the "mount point".

But isn't the submodule really part of the tree?
Right now the commit is used to construct the history of one project.
And a submodule is not part of the history of the parent, it is part
of the parent's tree.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 22:16 ` Jakub Narebski
  2006-11-20 22:28   ` Martin Waitz
@ 2006-11-20 22:43   ` Junio C Hamano
  2006-11-20 23:02     ` Jakub Narebski
  2006-11-20 23:05     ` Linus Torvalds
  1 sibling, 2 replies; 252+ messages in thread
From: Junio C Hamano @ 2006-11-20 22:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: jnareb

Jakub Narebski <jnareb@gmail.com> writes:

> By the way, in todo branch, in Subpro.txt, there is talk about adding
> link to submodule trees in _commit object_... well link to submodule tree
> or commit, with the "mount point".

That was shot down by Linus and I agree with him.  "bind" was a
bad idea because binding of a particular subproject commit into
a tree is a property of the tree, not one of the commits that
happen to have that tree.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
  2006-11-20 22:16 ` Jakub Narebski
@ 2006-11-20 22:49 ` Jakub Narebski
  2006-11-21  7:21 ` Shawn Pearce
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-11-20 22:49 UTC (permalink / raw)
  To: git

Could you please add to http://git.or.cz/gitwiki/SubprojectSupport
(even if all it would be is a link to archive of this thread)? TIA.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 22:43   ` Junio C Hamano
@ 2006-11-20 23:02     ` Jakub Narebski
  2006-11-20 23:52       ` Martin Waitz
  2006-11-21  1:31       ` Sam Vilain
  2006-11-20 23:05     ` Linus Torvalds
  1 sibling, 2 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-11-20 23:02 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> By the way, in todo branch, in Subpro.txt, there is talk about adding
>> link to submodule trees in _commit object_... well link to submodule tree
>> or commit, with the "mount point".
> 
> That was shot down by Linus and I agree with him.  "bind" was a
> bad idea because binding of a particular subproject commit into
> a tree is a property of the tree, not one of the commits that
> happen to have that tree.
  
"bind" was kind of "mount tree" idea; I agree that adding subproject
commits to trees is better idea than adding commits or trees to
superproject commit object.

By the way, what permissions get the subproject tree?

I wonder if it makes sense to be able to add tag objects instead
of commit objects to trees (depeel to tree or blob)...
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 22:43   ` Junio C Hamano
  2006-11-20 23:02     ` Jakub Narebski
@ 2006-11-20 23:05     ` Linus Torvalds
  2006-11-20 23:25       ` J. Bruce Fields
                         ` (3 more replies)
  1 sibling, 4 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-11-20 23:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, git

On Mon, 20 Nov 2006, Junio C Hamano wrote:
> 
> That was shot down by Linus and I agree with him.  "bind" was a
> bad idea because binding of a particular subproject commit into
> a tree is a property of the tree, not one of the commits that
> happen to have that tree.

Yes. I think it would be a _fine_ idea to have a new tree-entry type that 
points to a sub-commit, but it really does need to be on a "tree level", 
not a commit level.

If it's on a tree level, getting things like "git diff" etc to work is not 
impossible, and it will also fit very well into the whole git 
infrastructure.

So right now a tree entry can be another tree or a blob - and the only 
extension would be to add a "commit" type (which would largely _act_ as a 
tree entry, at least for sorting, ie it would use the same "sorts as if it 
had a '/' at the end" logic).

Now, to get everything to work seamlessly within such a commit thing 
might be a fair amount of work, but I'm not sure you even _need_ to. It 
might be ok to just say "subproject 'xyzzy' differs" in the diff, for 
example, and have some rudimentary support for "git status" etc talking 
about subprojects that need to be committed.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:05     ` Linus Torvalds
@ 2006-11-20 23:25       ` J. Bruce Fields
  2006-11-20 23:33         ` Martin Waitz
  2006-11-20 23:29       ` Martin Waitz
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 252+ messages in thread
From: J. Bruce Fields @ 2006-11-20 23:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jakub Narebski, git

On Mon, Nov 20, 2006 at 03:05:47PM -0800, Linus Torvalds wrote:
> 
> 
> On Mon, 20 Nov 2006, Junio C Hamano wrote:
> > 
> > That was shot down by Linus and I agree with him.  "bind" was a
> > bad idea because binding of a particular subproject commit into
> > a tree is a property of the tree, not one of the commits that
> > happen to have that tree.
> 
> Yes. I think it would be a _fine_ idea to have a new tree-entry type that 
> points to a sub-commit, but it really does need to be on a "tree level", 
> not a commit level.

Would it also be possible to allow the "Tree:" line in the commit object
to refer to a commit, or does the root of the project need to be a
special case?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:05     ` Linus Torvalds
  2006-11-20 23:25       ` J. Bruce Fields
@ 2006-11-20 23:29       ` Martin Waitz
  2006-11-21  0:10       ` Junio C Hamano
  2006-11-21 22:31       ` Yann Dirson
  3 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-20 23:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jakub Narebski, git

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

hoi :)

On Mon, Nov 20, 2006 at 03:05:47PM -0800, Linus Torvalds wrote:
> Now, to get everything to work seamlessly within such a commit thing 
> might be a fair amount of work, but I'm not sure you even _need_ to. It 
> might be ok to just say "subproject 'xyzzy' differs" in the diff, for 
> example, and have some rudimentary support for "git status" etc talking 
> about subprojects that need to be committed.

this is exactly the status of my implementation at the moment ;-)

Well, it does not yet explicitly tell that a subproject diffs,
but it just creates a diff of the two commit objects.

I guess we need some command line option to say if we only want
to know about that the submodule changes or if the diff should
recurse into it.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:25       ` J. Bruce Fields
@ 2006-11-20 23:33         ` Martin Waitz
  2006-11-21 18:01           ` J. Bruce Fields
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-11-20 23:33 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

hoi :)

On Mon, Nov 20, 2006 at 06:25:07PM -0500, J. Bruce Fields wrote:
> On Mon, Nov 20, 2006 at 03:05:47PM -0800, Linus Torvalds wrote:
> > Yes. I think it would be a _fine_ idea to have a new tree-entry type that 
> > points to a sub-commit, but it really does need to be on a "tree level", 
> > not a commit level.
> 
> Would it also be possible to allow the "Tree:" line in the commit object
> to refer to a commit, or does the root of the project need to be a
> special case?

this would then be something like the branch-archival proposal.
The user interface for such a beast would be difficult, as you have
to somehow specify if you mean the inner or outer repository.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:02     ` Jakub Narebski
@ 2006-11-20 23:52       ` Martin Waitz
  2006-11-21  1:31       ` Sam Vilain
  1 sibling, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-20 23:52 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 434 bytes --]

hoi :)

On Tue, Nov 21, 2006 at 12:02:34AM +0100, Jakub Narebski wrote:
> By the way, what permissions get the subproject tree?

In my approach no permissions are saved in the object database,
only the special bit to mark the submodule.
When checking out, the directory is created 0777 modulo umask,
just as other directories.  Then the submodule contents
are checked out with their normal permissions.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:05     ` Linus Torvalds
  2006-11-20 23:25       ` J. Bruce Fields
  2006-11-20 23:29       ` Martin Waitz
@ 2006-11-21  0:10       ` Junio C Hamano
  2006-11-21  0:42         ` Jakub Narebski
  2006-11-21  6:27         ` Martin Waitz
  2006-11-21 22:31       ` Yann Dirson
  3 siblings, 2 replies; 252+ messages in thread
From: Junio C Hamano @ 2006-11-21  0:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Jakub Narebski

Linus Torvalds <torvalds@osdl.org> writes:

> Now, to get everything to work seamlessly within such a commit thing 
> might be a fair amount of work, but I'm not sure you even _need_ to. It 
> might be ok to just say "subproject 'xyzzy' differs" in the diff, for 
> example, and have some rudimentary support for "git status" etc talking 
> about subprojects that need to be committed.

I agree with the static "diff" part, and probably "checkout" and
"merge" are not all that difficult.

However, if I recall correctly, it was rather nightmarish to
make this also work for reachability traversal necessary for
pack generation.  It was painful enough even when the bind was
at the commit level (which was way simpler to handle), but to do
this the right way, the bind needs to be done at the tree level,
and "rev-list --objects foo..bar" would need some way to limit
the commit ancestry chain of subproject at the same time, by
computing the commit ancestry of the embedded commits in the
trees.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21  0:10       ` Junio C Hamano
@ 2006-11-21  0:42         ` Jakub Narebski
  2006-11-21  6:21           ` Martin Waitz
  2006-11-21  6:27         ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-11-21  0:42 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
>> Now, to get everything to work seamlessly within such a commit thing 
>> might be a fair amount of work, but I'm not sure you even _need_ to. It 
>> might be ok to just say "subproject 'xyzzy' differs" in the diff, for 
>> example, and have some rudimentary support for "git status" etc talking 
>> about subprojects that need to be committed.
> 
> I agree with the static "diff" part, and probably "checkout" and
> "merge" are not all that difficult.
> 
> However, if I recall correctly, it was rather nightmarish to
> make this also work for reachability traversal necessary for
> pack generation.  It was painful enough even when the bind was
> at the commit level (which was way simpler to handle), but to do
> this the right way, the bind needs to be done at the tree level,
> and "rev-list --objects foo..bar" would need some way to limit
> the commit ancestry chain of subproject at the same time, by
> computing the commit ancestry of the embedded commits in the
> trees.

Perhaps it would be best to join those two subproject support
solutions together: "bind" tree/commit mount header in commit
object, and "commit" entry in a tree. But I agree that revision
walking needs to be rewamped... well, unless you always have
project and subproject in the same repository, and subprojects
are branches in the project too... 

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:02     ` Jakub Narebski
  2006-11-20 23:52       ` Martin Waitz
@ 2006-11-21  1:31       ` Sam Vilain
  1 sibling, 0 replies; 252+ messages in thread
From: Sam Vilain @ 2006-11-21  1:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski wrote:
> I wonder if it makes sense to be able to add tag objects instead
> of commit objects to trees (depeel to tree or blob)...
>   

I'd say "as well as", and the semantics should be that to something
browsing the filesystem, a tag looks like the type of object it refers
to. eg, tag a tree, it's a tree, tag a commit, it's a sub-project/tree,
tag a blob, it's a file.

The use case I'm thinking of is semi-transparent storing of archives;
instead of storing the archive body, store a tag which contains the
"extra" information - like the gzip headers for a gz and which
compression options are needed to reproduce the same output stream. For
a tar, the per-file information such as the filestamps, owner and
permissions are recorded, and it points to a tree. A clever porcelain
could detect these file types, and make sure the uncompressed streams
are stored.

People who are using clients which don't understand these tag objects in
between will get the contents of the node checked out instead, so
instead of getting "foo.tar.gz" as a file, I got a "foo.tar.gz/" directory.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21  0:42         ` Jakub Narebski
@ 2006-11-21  6:21           ` Martin Waitz
  2006-11-21 10:04             ` Jakub Narebski
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-11-21  6:21 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

hoi :)

On Tue, Nov 21, 2006 at 01:42:22AM +0100, Jakub Narebski wrote:
> Perhaps it would be best to join those two subproject support
> solutions together: "bind" tree/commit mount header in commit
> object, and "commit" entry in a tree.

But which is the autoritative source then?
Does it give any more information?

The advantage in your proposal would be that submodules would
be visible immediately when looking at the commit,
without having to traverse the entire tree.
This may be worthwhile when showing the combined history of parent
and submodules.

But still this looks like "caching submodule information in the
commit object" and I do not know if we really want to do that.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21  0:10       ` Junio C Hamano
  2006-11-21  0:42         ` Jakub Narebski
@ 2006-11-21  6:27         ` Martin Waitz
  2006-11-21  7:36           ` Junio C Hamano
  1 sibling, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-11-21  6:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git, Jakub Narebski

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

hoi :)

On Mon, Nov 20, 2006 at 04:10:50PM -0800, Junio C Hamano wrote:
> However, if I recall correctly, it was rather nightmarish to
> make this also work for reachability traversal necessary for
> pack generation.  It was painful enough even when the bind was
> at the commit level (which was way simpler to handle), but to do
> this the right way, the bind needs to be done at the tree level,
> and "rev-list --objects foo..bar" would need some way to limit
> the commit ancestry chain of subproject at the same time, by
> computing the commit ancestry of the embedded commits in the
> trees.

This at least seems to work already.
The UNINTERESTING flag is recursively set for the submodule
commits while walking the object chain.

But I must admit that I only did very simple tests up to now.
Do you have any special constellations in mind which were
difficult to support?

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
  2006-11-20 22:16 ` Jakub Narebski
  2006-11-20 22:49 ` Jakub Narebski
@ 2006-11-21  7:21 ` Shawn Pearce
  2006-11-22  5:29 ` Petr Baudis
  2006-12-02 20:16 ` Jakub Narebski
  4 siblings, 0 replies; 252+ messages in thread
From: Shawn Pearce @ 2006-11-21  7:21 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz <tali@admingilde.org> wrote:
> I am currently working on adding submodule support to GIT.
> Here I am presenting some prototyping work to show how submodules could
> be implemented in GIT.

Aside from a GUI for Git that I can give to a non-technical
user and have them be productive with (and without getting "this
sucks!" complaints from them), submodule support is the next highest
priority feature for me in Git.  So as soon as I can get git-gui
to that point I'll probably redirect as much of my "Git time"
as I can to testing your submodule implementation.

> When a merge in the parent has to resolve changes in the submodule,
> then it does exactly the same as for files: at first it is tried to resolve
> it in the index and if this is not possible it will have store stages 1-3
> in the index and tries a content merge.  The only difference with submodules
> is that the content merge is not possible with a simple diff3 call, but
> that the GIT merge machinery has to recurse into the submodule.

Right.  And what's really cool about that is many times a subproject
merge will be trivial, so top level project merges will be still be
trivial in index merges.  But what's complicated about that is you
need to make sure the subproject working directory is fast-forwarded
to the new commit.  :)

> The most important next step is to commit to some object database format
> for submodules.  So please, do give feedback about the proposed changes
> to the tree object.

I think the S_IFSOCK approach is the right way to go here, and thus
I'm reasonably happy with this repository format.  But Linus and
Junio do tend to be better at this than I... :-)

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21  6:27         ` Martin Waitz
@ 2006-11-21  7:36           ` Junio C Hamano
  2006-11-21  7:55             ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Junio C Hamano @ 2006-11-21  7:36 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Linus Torvalds, git

Martin Waitz <tali@admingilde.org> writes:

> On Mon, Nov 20, 2006 at 04:10:50PM -0800, Junio C Hamano wrote:
>
>> However, if I recall correctly, it was rather nightmarish to
>> make this also work for reachability traversal necessary for
>> pack generation.  It was painful enough even when the bind was
>> at the commit level (which was way simpler to handle), but to do
>> this the right way, the bind needs to be done at the tree level,
>> and "rev-list --objects foo..bar" would need some way to limit
>> the commit ancestry chain of subproject at the same time, by
>> computing the commit ancestry of the embedded commits in the
>> trees.
>
> This at least seems to work already.
> The UNINTERESTING flag is recursively set for the submodule
> commits while walking the object chain.

I think that is fine as long as we somehow enforce the topology
of submodule to be similar to the toplevel topology.  Otherwise
I suspect it leads to unintuitive behaviour.

Suppose that the ancestry chain for the toplevel are A, A~1, A~2
and you asked for "A~2..A".  A submodule is bound at tree "sub/"
and suppose A:sub/ == B, A~1:sub/ == C, and A~2:sub/ == D.

Now further suppose the ancestry chain for B, C and D are like
this:

              o---C
             /     \
     ...o---o---D---B

A naive implementation of "--objects A~2..A" would propagate
UNINTERESTING to D and mark B and C unmarked.  Would it however
be reasonable to include commits marked as 'o'?

I am not trying to be negative here, but just raising things
that I did not think through when I tried to tackle it the last
time...

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21  7:36           ` Junio C Hamano
@ 2006-11-21  7:55             ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-21  7:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

hoi :)

On Mon, Nov 20, 2006 at 11:36:55PM -0800, Junio C Hamano wrote:
> I think that is fine as long as we somehow enforce the topology
> of submodule to be similar to the toplevel topology.  Otherwise
> I suspect it leads to unintuitive behaviour.
> 
> Suppose that the ancestry chain for the toplevel are A, A~1, A~2
> and you asked for "A~2..A".  A submodule is bound at tree "sub/"
> and suppose A:sub/ == B, A~1:sub/ == C, and A~2:sub/ == D.
> 
> Now further suppose the ancestry chain for B, C and D are like
> this:
> 
>               o---C
>              /     \
>      ...o---o---D---B
> 
> A naive implementation of "--objects A~2..A" would propagate
> UNINTERESTING to D and mark B and C unmarked.  Would it however
> be reasonable to include commits marked as 'o'?

I think it is reasonable to just go on as in a normal repository.
That is, pretend we want to list D..B and mark all commits which
are reachable.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21  6:21           ` Martin Waitz
@ 2006-11-21 10:04             ` Jakub Narebski
  2006-11-21 11:49               ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-11-21 10:04 UTC (permalink / raw)
  To: git

Martin Waitz wrote:

> On Tue, Nov 21, 2006 at 01:42:22AM +0100, Jakub Narebski wrote:
>> Perhaps it would be best to join those two subproject support
>> solutions together: "bind" tree/commit mount header in commit
>> object, and "commit" entry in a tree.
> 
> But which is the autoritative source then?
> Does it give any more information?

Both should contain the same information, otherwise repository is corrupt
(is in inconsistent state).

"bind" header in commit objects is meant as a kind of shortcut, to ease
reachability checking (you don't need to recurse into directories).

> The advantage in your proposal would be that submodules would
> be visible immediately when looking at the commit,
> without having to traverse the entire tree.
> This may be worthwhile when showing the combined history of parent
> and submodules.

That was the idea.

> But still this looks like "caching submodule information in the
> commit object" and I do not know if we really want to do that.

Well, we would be repeating information, sure. But we can put additional
information in "bind" header except sha1 of commit and mount point...
although I cannot think what... :)

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21 10:04             ` Jakub Narebski
@ 2006-11-21 11:49               ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-21 11:49 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

hoi :)

On Tue, Nov 21, 2006 at 11:04:46AM +0100, Jakub Narebski wrote:
> "bind" header in commit objects is meant as a kind of shortcut, to ease
> reachability checking (you don't need to recurse into directories).

Well, but you already have to recurse to find all objects which are
reachable by a commit, so you don't loose anything.

> > The advantage in your proposal would be that submodules would
> > be visible immediately when looking at the commit,
> > without having to traverse the entire tree.
> > This may be worthwhile when showing the combined history of parent
> > and submodules.
> 
> That was the idea.

On the other hand that only has to be done once anyway.
After you traversed the tree once you can create your own
(in memory) cache of submodules connected to the tree.
While walking the commits backwards, you only have to check those
parts of the tree which have changed.
So it may even be suitable for larger repositories.
But clearly it is not as low as with the in-commit cache.
So we have to weight complexity of the data storage with
runtime complexity.  Opinions?

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:33         ` Martin Waitz
@ 2006-11-21 18:01           ` J. Bruce Fields
  2006-11-21 19:32             ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: J. Bruce Fields @ 2006-11-21 18:01 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

On Tue, Nov 21, 2006 at 12:33:34AM +0100, Martin Waitz wrote:
> On Mon, Nov 20, 2006 at 06:25:07PM -0500, J. Bruce Fields wrote:
> > Would it also be possible to allow the "Tree:" line in the commit object
> > to refer to a commit, or does the root of the project need to be a
> > special case?
> 
> this would then be something like the branch-archival proposal.

Do you have any pointers to previous discussion?  (A couple obvious
searches don't turn up anything for me.)


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21 18:01           ` J. Bruce Fields
@ 2006-11-21 19:32             ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-21 19:32 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Tue, Nov 21, 2006 at 01:01:27PM -0500, J. Bruce Fields wrote:
> On Tue, Nov 21, 2006 at 12:33:34AM +0100, Martin Waitz wrote:
> > On Mon, Nov 20, 2006 at 06:25:07PM -0500, J. Bruce Fields wrote:
> > > Would it also be possible to allow the "Tree:" line in the commit object
> > > to refer to a commit, or does the root of the project need to be a
> > > special case?
> > 
> > this would then be something like the branch-archival proposal.
> 
> Do you have any pointers to previous discussion?  (A couple obvious
> searches don't turn up anything for me.)

Aug 04 Eric W. Biederman    [RFC][PATCH] Branch history

I really think that using subprojects can be used for this workflow, too.
But adding a submodule directly to the root is not really possible,
we'd have to use special user interfaces for that, even when the
git-core might be able to handle it.
But what might be possible is to have one toplevel history-tracking
repository in e.g. ~/src and then add all the repositories you work
with as a submodule.  Whenever you want to record the history of
some project, you can simply commit it to ~/src.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 23:05     ` Linus Torvalds
                         ` (2 preceding siblings ...)
  2006-11-21  0:10       ` Junio C Hamano
@ 2006-11-21 22:31       ` Yann Dirson
  2006-11-21 22:51         ` Linus Torvalds
  3 siblings, 1 reply; 252+ messages in thread
From: Yann Dirson @ 2006-11-21 22:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jakub Narebski, git

On Mon, Nov 20, 2006 at 03:05:47PM -0800, Linus Torvalds wrote:
> On Mon, 20 Nov 2006, Junio C Hamano wrote:
> > 
> > That was shot down by Linus and I agree with him.  "bind" was a
> > bad idea because binding of a particular subproject commit into
> > a tree is a property of the tree, not one of the commits that
> > happen to have that tree.
> 
> Yes. I think it would be a _fine_ idea to have a new tree-entry type that 
> points to a sub-commit, but it really does need to be on a "tree level", 
> not a commit level.

I'm not sure I get the reason why the submodule should not be recorded
on "commit level".

What I'm thinking of would be that the submodule tree would just be a
standard antry of a tree in the supermodule, and we could record the
submodule commit (pointing to the submodule tree) in the supermodule
commit.

This idea came when thinking about implementing partial merges.  That
is, when different people are responsible for different parts of the
tree, and thus when merging a given branch, each dev has to make only
a partial merge of the full tree.
Having submodule commits referenced directly from the supercommit would
make it much easier to finalize the merge (ie. merging the full project
while taking into account that some subtrees have been merged already).

Best regards,
-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21 22:31       ` Yann Dirson
@ 2006-11-21 22:51         ` Linus Torvalds
  2006-11-21 22:59           ` Linus Torvalds
  2006-11-21 23:54           ` Yann Dirson
  0 siblings, 2 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-11-21 22:51 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Junio C Hamano, Jakub Narebski, git

On Tue, 21 Nov 2006, Yann Dirson wrote:
> 
> I'm not sure I get the reason why the submodule should not be recorded
> on "commit level".

Because that would be STUPID.

What does the submodules have to do with the commit level? Nothing. Nada. 
Zero.

Submodules are _directories_. They can be anywhere in the directory tree. 
If you try to encode that in a commit message, you're going to totally 
break the whole notion of trying to "diff" two trees. 

All of git is designed around the notion that a tree is the directory 
structure. If you put directory structure somewhere else, you totally 
screw all abstractions.

Now, if that weren't enough, let me enumerate _another_ reason why it's 
idiotic and wrong, namely the fact that a "commit" is fundamnetally the 
wrong place to add something like that _anyway_. Quite apart from the fact 
that we describe directory trees with (wait for it): "tree objects", the 
thing is, a commit is about a totally different _dimension_ altogether. 

The only and _whole_ point of a "commit" is to describe the "time 
dimension". Something that doesn't always change in time should not be in 
a commit object, because it is by definition not what a commit is all 
about. A commit should describe the relationship of itself to other 
commits, ie it's a "how did this change".

And a sub-project simply doesn't even _do_ that. Much of the time, a 
subproject stays constant, and is not something that comes and goes on an 
individual commit basis. 

I don't understand why people are so fixated with putting things in the 
wrong object. WHY do people want to put crap in the "commit" object? 
People have wanted to put "rename" information there (which is stupid for 
all the same reasons: renames _remain_. They aren't a one-time event. If 
something was renamed in commit X, it will _remain_ renamed in commit X+1, 
so it's clearly not really a "commit X" thing)

Think of it this way:

 - if something _only_ makes sense on an _individual commit_ level, it 
   goes into the "commit object". But if it makes sense for "git diff",
   then it MUST NOT be in a commit object, because you do "git diff" over
   a big _range_ of commit objects.

Think "git show". The "author" of a commit is only associated with a 
_single_ commit. It thus goes into the commit object, and nowhere else. 
Same goes for time, and commit message. A commit message is fundamentally 
a "this explains this _one_ commit".

But anything that you expect to have in a "range" of commits MUST NOT be 
in a "commit object". If I do "git diff v2.6.13..v2.6.14", and I expect 
the behaviour you want to encode to show up (and dammit, subprojects very 
much fall under that heading - exactly the same way renames must have 
meaning _outside_ of a single commit) then clearly it is NOT something 
that is associated with any individual commits. It's something that is 
associated with the _state_ of the project.

And the _state_ of the project is the "tree". Not the commit. The commit 
is about the _history_ of the project.

So please understand this: "commit" is about the time-dimension 
("history"). "tree" is about the space-dimension ("state"). The two are 
_related_, but they are also very much different concepts, and "related" 
does not mean "you can mix them up".

Sub-projects are clearly not about "time". They are about "state".

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21 22:51         ` Linus Torvalds
@ 2006-11-21 22:59           ` Linus Torvalds
  2006-11-21 23:54           ` Yann Dirson
  1 sibling, 0 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-11-21 22:59 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Junio C Hamano, Jakub Narebski, git

On Tue, 21 Nov 2006, Linus Torvalds wrote:
> 
> Submodules are _directories_.

Side note - you can do submodules other ways, but if you do, you'll almost 
certainly go crazy. 

You could, for example, make submodules be some kind of "union 
filesystem", where you allow overlapping trees. It's conceptually 
possible. It's also horribly horribly wrong, if only because I guarantee 
that you'll have so many problems with it that you will only end up with a 
mess that is even worse than "branches" in CVS.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21 22:51         ` Linus Torvalds
  2006-11-21 22:59           ` Linus Torvalds
@ 2006-11-21 23:54           ` Yann Dirson
  2006-11-22  3:40             ` Shawn Pearce
  1 sibling, 1 reply; 252+ messages in thread
From: Yann Dirson @ 2006-11-21 23:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jakub Narebski, git

On Tue, Nov 21, 2006 at 02:51:56PM -0800, Linus Torvalds wrote:
> 
> 
> On Tue, 21 Nov 2006, Yann Dirson wrote:
> > 
> > I'm not sure I get the reason why the submodule should not be recorded
> > on "commit level".
> 
> Because that would be STUPID.
> 
> What does the submodules have to do with the commit level? Nothing. Nada. 
> Zero.

Oh, I see I may have expressed something in the wrong way :)
Namely, I brought an idea coming from partial merges into a discussion
on submodules, because when thinking about the former, I realized
we could maybe use similar mechanisms for both.

Note that the proposal I outlined did not break the tree, in that the
sumodule tree is still in the same place.  In the case of a partial
merge, the info that a subtree has been merged in this commit is indeeed
part of the commit itself.

I agree that the subtree case is somewhat different, and my idea may not
apply to submodules after all :)

A question would be, do "submodules" have to be permanent objects ?
I suppose it depends on what people want to use them for.  Indeed, the
"submodule" names strongly carries the idea of a permanent subset of the
repository.  My proposal partial merges could be seen as using transient
submodules: they do not matter much during most of the repo life.

Put it another way, I see the proposal of allowing tree entries to be
commits in addition to trees and blobs, akin to recording the submodule
_history_ inside the _tree_, which I feel precisely violates the
distinction you want to keep between those 2 concepts.

> And a sub-project simply doesn't even _do_ that. Much of the time, a 
> subproject stays constant, and is not something that comes and goes on an 
> individual commit basis. 

What about the case of a subproject that would evolve fast, and for
which we may not want intermediate versions to be part of the
supermodule ?  (just exploring an idea without real connection to the
one discussed above)

I mean, I have a tree in which the whole software for an embedded
platform is stored, including kernel, apps, etc.  While working on the
kernel, I may want to do several commits to that submodule, and may not want
to commit to the supermodule for each kernel commit, only when I feel the
kernel is stable enough.

One may argue I just have to use a branch.  Anyway, there will be a need
for submodule-specific branches - eg. kernel.org ones in my case.

An alternative would be to allow committing to the submodule without
creating matching supermodule commits, and let the user decide when he
wants to commit at the higher level.  That way, 2 successive supermodule
commits could have non-successive "subcommits".

Best regards,
-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-21 23:54           ` Yann Dirson
@ 2006-11-22  3:40             ` Shawn Pearce
  2006-11-23 23:23               ` Yann Dirson
  0 siblings, 1 reply; 252+ messages in thread
From: Shawn Pearce @ 2006-11-22  3:40 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

Yann Dirson <ydirson@altern.org> wrote:
> Put it another way, I see the proposal of allowing tree entries to be
> commits in addition to trees and blobs, akin to recording the submodule
> _history_ inside the _tree_, which I feel precisely violates the
> distinction you want to keep between those 2 concepts.

No.  Linus is right.  Submodule commits belong in the tree.

We want to record a specific subtree within a larger tree.  There are
three ways we can refer to a tree: by its tree SHA1, by a commit
which points at the tree SHA1, or by a tag which points at a
commit which points at the tree SHA1, or by a tag which points
at a tag which points at a commit which points at a tree SHA1.
Which is basically a tree-ish.

The advantage of linking to the commit-ish (commit or tag) and
not the tree-ish for a submodule is that it also provides you quick
access to answer the "how did this tree arive at this state" question
as the answer cannot come solely from the top level commit chain.
The reason... keep reading...

> What about the case of a subproject that would evolve fast, and for
> which we may not want intermediate versions to be part of the
> supermodule ?  (just exploring an idea without real connection to the
> one discussed above)

Right.  The submodule is free to be committed to an infinite number
of times for any given commit in the supermodule.

It is expected that users will commit to a submodule say hundreds of
times for every commit they make to the supermodule.  Or thousands.
This is especially true if the submodule is some very large project,
e.g. the Linux kernel, and the supermodule "upgrades" the kernel it
is using after 3 months of staying on the same version.  Suddenly the
supermodule has only 1 commit which covers maybe 10,000 commits in
the submodule.

Yet we still want to be able to efficiently perform operations like
"git bisect" within the scope of that submodule, to help narrow down
a particular bug that is within that submodule.  To do that we need
the commit chain (all 10,000 of those commits) in the submodule.
To get those we really need a commit-ish and not a tree-ish, as
going from a tree-ish to a commit-ish is not only not unique but
is also pretty infeasible to do (you need to scan *every* commit).

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
                   ` (2 preceding siblings ...)
  2006-11-21  7:21 ` Shawn Pearce
@ 2006-11-22  5:29 ` Petr Baudis
  2006-12-02 20:16 ` Jakub Narebski
  4 siblings, 0 replies; 252+ messages in thread
From: Petr Baudis @ 2006-11-22  5:29 UTC (permalink / raw)
  To: git

On Mon, Nov 20, 2006 at 10:51:16PM CET, Martin Waitz wrote:
> The current status can be viewed in
> http://git.admingilde.org/tali/git.git/module2
> (on top of next)

(For those wondering, a diff of the changes is at:

	http://git.admingilde.org/tali/git.git/?a=commitdiff;h=module2;hp=next

Any bright ideas on how to best make such diffs reachable in the UI?)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
The meaning of Stonehenge in Traflamadorian, when viewed from above, is:
"Replacement part being rushed with all possible speed."

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-22  3:40             ` Shawn Pearce
@ 2006-11-23 23:23               ` Yann Dirson
  2006-11-25  6:53                 ` Shawn Pearce
  0 siblings, 1 reply; 252+ messages in thread
From: Yann Dirson @ 2006-11-23 23:23 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

On Tue, Nov 21, 2006 at 10:40:56PM -0500, Shawn Pearce wrote:
> Yet we still want to be able to efficiently perform operations like
> "git bisect" within the scope of that submodule, to help narrow down
> a particular bug that is within that submodule.  To do that we need
> the commit chain (all 10,000 of those commits) in the submodule.
> To get those we really need a commit-ish and not a tree-ish, as
> going from a tree-ish to a commit-ish is not only not unique but
> is also pretty infeasible to do (you need to scan *every* commit).

We don't need to have commits in the tree for this.  We'll just have
submodule commits which are not attached to a supermodule commit, and we
can access the whole submodule history through the submodule .git/HEAD,
just like we do for a standard git project.

Or do I miss something else ?

Best regards,
-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-23 23:23               ` Yann Dirson
@ 2006-11-25  6:53                 ` Shawn Pearce
  2006-11-25 11:12                   ` Yann Dirson
  0 siblings, 1 reply; 252+ messages in thread
From: Shawn Pearce @ 2006-11-25  6:53 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

Yann Dirson <ydirson@altern.org> wrote:
> On Tue, Nov 21, 2006 at 10:40:56PM -0500, Shawn Pearce wrote:
> > Yet we still want to be able to efficiently perform operations like
> > "git bisect" within the scope of that submodule, to help narrow down
> > a particular bug that is within that submodule.  To do that we need
> > the commit chain (all 10,000 of those commits) in the submodule.
> > To get those we really need a commit-ish and not a tree-ish, as
> > going from a tree-ish to a commit-ish is not only not unique but
> > is also pretty infeasible to do (you need to scan *every* commit).
> 
> We don't need to have commits in the tree for this.  We'll just have
> submodule commits which are not attached to a supermodule commit, and we
> can access the whole submodule history through the submodule .git/HEAD,
> just like we do for a standard git project.

No.  You cannot do that.

How do we setup .git/HEAD when bisecting the supermodule?
Or merging it?  Or doing anything else with it?

Ideally the .git/HEAD of every submodule should seek to the commit
that points at the tree of the submodule which the supermodule
is referencing.  This lets you then perform a bisect within the
submodule when you identify the supermodule commit which caused
the breakage.

We need the submodule commits to do this.  Doing it without is
too expensive.

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25  6:53                 ` Shawn Pearce
@ 2006-11-25 11:12                   ` Yann Dirson
  2006-11-25 18:57                     ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Yann Dirson @ 2006-11-25 11:12 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Linus Torvalds, Junio C Hamano, Jakub Narebski, git

On Sat, Nov 25, 2006 at 01:53:38AM -0500, Shawn Pearce wrote:
> Yann Dirson <ydirson@altern.org> wrote:
> > We don't need to have commits in the tree for this.  We'll just have
> > submodule commits which are not attached to a supermodule commit, and we
> > can access the whole submodule history through the submodule .git/HEAD,
> > just like we do for a standard git project.
> 
> No.  You cannot do that.
> 
> How do we setup .git/HEAD when bisecting the supermodule?
> Or merging it?  Or doing anything else with it?

Would there be any problem assuming git-update-ref would take care of
updating it ?

> Ideally the .git/HEAD of every submodule should seek to the commit
> that points at the tree of the submodule which the supermodule
> is referencing.

You mean, whenever we seek the HEAD of the supermodule, right ?

> This lets you then perform a bisect within the
> submodule when you identify the supermodule commit which caused
> the breakage.
 
That is, first bisect the supermodule (which naturally bisects the
submodule with rough granularity, assuming there are many submodule
commits for at least some supermodule commits), then bisect the submodule
between the two commits identified at supermodule level, right ?

> We need the submodule commits to do this.  Doing it without is
> too expensive.

Maybe I missed something again, but I'm still not convinced :)
-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25 11:12                   ` Yann Dirson
@ 2006-11-25 18:57                     ` Linus Torvalds
  2006-11-25 19:19                       ` Steven Grimm
  0 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-11-25 18:57 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Shawn Pearce, Junio C Hamano, Jakub Narebski, git

On Sat, 25 Nov 2006, Yann Dirson wrote:
> 
> > This lets you then perform a bisect within the
> > submodule when you identify the supermodule commit which caused
> > the breakage.
>  
> That is, first bisect the supermodule (which naturally bisects the
> submodule with rough granularity, assuming there are many submodule
> commits for at least some supermodule commits), then bisect the submodule
> between the two commits identified at supermodule level, right ?

Right. That is how you _must_ do it.

The reason is:

 - the supermodule will not track every release of the submodule. One of 
   the biggest reasons for using submodules in the first place is that the 
   submodules have their own development _independently_ of the 
   supermodule, and usually the supermodule will import new versions of 
   submodules only occasionally (eg the supermodule might choose to track 
   only major releases of the submodule, for example)

   (And yes, I realize that this is not necessarily the only submodule 
   usage: sometimes the submodules are literally _only_ developed as 
   submodules, and you'd never develop them independently. It depends on 
   the situation)

 - As a resule of the above, you MUST NOT do bisection at the submodule 
   level at first: it's entirely possible that the supermodule never ever 
   actually used the submodule state at a finer granularity, and 
   "bisecting" into such state would be idiotic (it's really no different 
   from "bisecting" a regular commit by splitting up a commit into patches 
   against individual files - sure, it's a smaller granularity, but it's a 
   granularity that never _existed_, and was never tested or intended to 
   work!)

So yes, you should expect that

 (a) submodule changes "jump around" in the supermodule - even to the 
     point of going backwards in time as far as the submodule is concerned 
     (ie the supermodule might have tested a new release of a submodule, 
     committed that, found a problem, and decided to just go back to an 
     earlier version of the submodule again, and committed that again)

 (b) This implies very much that there can be a n:m relationship between 
     submodule and supermodule commits. A supermodule commit does _not_ 
     imply a commit in the submodule (it might commit changes to the 
     top-level makefile or to _another_ submodule), but equally, a 
     submodule commit does _not_ imply a commit in the supermodule 
     (because the submodule might be independently changed in some other 
     repository where it's the _primary_ development, not a submodule)

So you shouldn't expect submodules to be very "tightly" coupled, and I 
don't think you even want the workflow to _be_ that tight. I think it's ok 
if submodules show up as such, and that "git diff" etc don't try to make 
it all "seamless".

It often _shouldn't_ be seamless: you should be able to commit to a 
supermodule without committing the submodule state: it's really no 
different from committing individual files (it migth be somethign that is 
_discouraged_ as a workflow for some project, the same way you might 
discourage using "git commit one/file" over "git commit -a", and for the 
same reason: you're committing some state that doesn't match what your 
tree actually looks like).

Similarly, doing a "git commit -a" within a submodule should really just 
commit _that_ submodule, and not even _try_ to know about supermodules 
etc, because the submodule really should be a totally independent git 
repository.

[ Side note: you may well want to set up submodules so that they share the 
  object store with the supermodule: that may be the simplest way to make 
  operations that traverse things recursively work out, since it means 
  that you can do object lookups for everythign you traverse without 
  having to even think about it.

  On the other hand, this could equally easily be done by just making 
  every submodule an "alternates" directory in the supermodule: that keeps 
  the object databases separate, but means that anybody in the supermodule 
  will always be able to look up all the objects in the submodules. So 
  even here, we certainly _can_ keep things separated, without even 
  introducing any new concepts. ]

So I actually think that submodules should at least start out as something 
rather independent, where a "commit -a" in the supermodule will _only_ 
commit the supermodule itself - and if you haven't committed the submodule 
yet, you'll just get the current HEAD state of the submodule.

Add some trivial help in "git status" to _warn_ about the fact that 
submodules haven't been committed and are dirty, but I really think that 
it should be a very explicit thing where you really do see things as 
submodules, not as "one big module".

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25 18:57                     ` Linus Torvalds
@ 2006-11-25 19:19                       ` Steven Grimm
  2006-11-25 19:30                         ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Steven Grimm @ 2006-11-25 19:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds wrote:
> So I actually think that submodules should at least start out as something 
> rather independent, where a "commit -a" in the supermodule will _only_ 
> commit the supermodule itself - and if you haven't committed the submodule 
> yet, you'll just get the current HEAD state of the submodule.

That would make it impossible to atomically commit a change that affects 
two submodules, yes? I think cross-submodule commit is highly desirable 
and will be a fairly common use case for submodules if it's supported. 
For example, if you have "client" and "server" submodules and someone 
makes a protocol change, you don't want some unwitting developer to pull 
just half of the change and end up with incompatible code in the two 
submodules.

I have no problem with making the "only commit the supermodule" behavior 
the default and requiring a command-line option for the "commit 
everything" case, but I think "commit everything" is useful. And 
honestly IMO it should be the default since it'll behave in a less 
surprising way; when I do a "commit -a" I expect all my changes to be 
committed, whether they're in submodules or not.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25 19:19                       ` Steven Grimm
@ 2006-11-25 19:30                         ` Linus Torvalds
  2006-11-25 23:49                           ` Yann Dirson
  0 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-11-25 19:30 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

On Sat, 25 Nov 2006, Steven Grimm wrote:

> Linus Torvalds wrote:
> > So I actually think that submodules should at least start out as something
> > rather independent, where a "commit -a" in the supermodule will _only_
> > commit the supermodule itself - and if you haven't committed the submodule
> > yet, you'll just get the current HEAD state of the submodule.
> 
> That would make it impossible to atomically commit a change that affects two
> submodules, yes?

No. Quite the reverse. What you do is:

 (a) commit both submodules INDEPENDENTLY.

 (b) then commit the supermodule that contains the submodules.

And note how the important part here is that committing in a submodule 
DOES NOT AFFECT THE SUPERMODULE AT ALL!

The git trees are _independent_. That's important. You should _not_ try to 
mix them up and make a commit in one commit anything AT ALL in some other 
tree, exctly because it gets impossible to do (a) interesting things and 
(b) atomic commits otherwise.

Note that this is true also in the case of a submodule that itself 
contains a submodule. That doesn't change anything - you still need to be 
able to view _each_ layer as an independent thing.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25 19:30                         ` Linus Torvalds
@ 2006-11-25 23:49                           ` Yann Dirson
  2006-11-26  1:14                             ` Sven Verdoolaege
  2006-11-26  3:39                             ` Linus Torvalds
  0 siblings, 2 replies; 252+ messages in thread
From: Yann Dirson @ 2006-11-25 23:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Steven Grimm, git

On Sat, Nov 25, 2006 at 11:30:47AM -0800, Linus Torvalds wrote:
> The git trees are _independent_. That's important.

I'm not sure how independant you mean them to be.  The approach I've
tried to describe so far assumes that, although you can look at each
submodule independently from the supermodule or any other submodule, you
can still look at the supermodule as a single tree of it own.

Eg, so that if one part of an appliance/ modules ends up promoted to a
lib/ module, GIT can still show that as a move within the supermodule.
If we insist that the submodules get committed independently before we
make a supermodule commit tying those together, I fear it may make things
like such "move/copy detection" more tricky ?

Also, I'd rather expect "git-commit -a" outside of any submodule to
commit everything in the supermodule, triggering submodule commits as an
intermediate step when needed - just like "git-commit -a" does not
require to manually specify subdirectories to inclue in the commit.  I'd
rather expect a special flag to exclude submodules from a commit.

Best regards,
--

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25 23:49                           ` Yann Dirson
@ 2006-11-26  1:14                             ` Sven Verdoolaege
  2006-11-26  1:32                               ` Yann Dirson
  2006-11-26  3:39                             ` Linus Torvalds
  1 sibling, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-11-26  1:14 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Linus Torvalds, Steven Grimm, git

FWIW, here's my view on this issue.

On Sun, Nov 26, 2006 at 12:49:08AM +0100, Yann Dirson wrote:
> Also, I'd rather expect "git-commit -a" outside of any submodule to
> commit everything in the supermodule, triggering submodule commits as an
> intermediate step when needed - just like "git-commit -a" does not
> require to manually specify subdirectories to inclue in the commit.  I'd
> rather expect a special flag to exclude submodules from a commit.

A commit should record the content changes that have been made, not change
any content itself.  Some VCSs change the contents of a file when you
commit them (e.g., keyword substitution).  Git, rightly, doesn't do that.
Likewise, when you commit in the superproject, it should simply record
the changes to the "content" of the subproject and not change it.
And the content of the subproject is a commit, so a commit in the
superproject should not change the content of the subproject by creating
another commit in the subproject.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-26  1:14                             ` Sven Verdoolaege
@ 2006-11-26  1:32                               ` Yann Dirson
  0 siblings, 0 replies; 252+ messages in thread
From: Yann Dirson @ 2006-11-26  1:32 UTC (permalink / raw)
  To: skimo; +Cc: Linus Torvalds, Steven Grimm, git

On Sun, Nov 26, 2006 at 02:14:20AM +0100, Sven Verdoolaege wrote:
> Likewise, when you commit in the superproject, it should simply record
> the changes to the "content" of the subproject and not change it.
> And the content of the subproject is a commit, so a commit in the
> superproject should not change the content of the subproject by creating
> another commit in the subproject.

I've realized after suggesting that how much that idea was inadequate -
sorry for the noise.

However, I'm not yet buying the idea that "the content of the subproject
is a commit" :)

Best regards,
-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-25 23:49                           ` Yann Dirson
  2006-11-26  1:14                             ` Sven Verdoolaege
@ 2006-11-26  3:39                             ` Linus Torvalds
  2006-11-26  8:05                               ` Daniel Barkalow
  1 sibling, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-11-26  3:39 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Steven Grimm, git



On Sun, 26 Nov 2006, Yann Dirson wrote:
> 
> Also, I'd rather expect "git-commit -a" outside of any submodule to
> commit everything in the supermodule, triggering submodule commits as an
> intermediate step when needed - just like "git-commit -a" does not
> require to manually specify subdirectories to inclue in the commit.  I'd
> rather expect a special flag to exclude submodules from a commit.

So, how do you do commit messages? It generally doesn't make sense to 
share the same commit message for submodules - the sub-commits generally 
do different things.

I'd actually suggest that "git commit -a" with non-clean submodules error 
out for that reason, with something like

	submodule 'src/xyzzy' is not up-to-date, please commit changes to 
	that first.

exactly because you really generally should consider the submodule commits 
to be a separate phase.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-26  3:39                             ` Linus Torvalds
@ 2006-11-26  8:05                               ` Daniel Barkalow
  2006-11-28  9:36                                 ` Andreas Ericsson
  0 siblings, 1 reply; 252+ messages in thread
From: Daniel Barkalow @ 2006-11-26  8:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Yann Dirson, Steven Grimm, git

On Sat, 25 Nov 2006, Linus Torvalds wrote:

> On Sun, 26 Nov 2006, Yann Dirson wrote:
> > 
> > Also, I'd rather expect "git-commit -a" outside of any submodule to
> > commit everything in the supermodule, triggering submodule commits as an
> > intermediate step when needed - just like "git-commit -a" does not
> > require to manually specify subdirectories to inclue in the commit.  I'd
> > rather expect a special flag to exclude submodules from a commit.
> 
> So, how do you do commit messages? It generally doesn't make sense to 
> share the same commit message for submodules - the sub-commits generally 
> do different things.

The same way you do the first commit message. Ask independantly for each 
commit message in sequence with enough context in the comment section that 
you know what you're talking about.

> I'd actually suggest that "git commit -a" with non-clean submodules error 
> out for that reason, with something like
> 
> 	submodule 'src/xyzzy' is not up-to-date, please commit changes to 
> 	that first.
> 
> exactly because you really generally should consider the submodule commits 
> to be a separate phase.

I think this is getting close to the classic usability blunder of having 
the program tell you what you should have done instead of what you did, 
and then making you do it yourself, rather than just doing it.

Just have it run "git commit -a" in each dirty submodule recursively as 
part of preparing the index, since that's what the user wants to do 
anyway, and nothing already done would be affected.

"git commit -a -m <message>" should probably fail, of course.

	-Daniel

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-26  8:05                               ` Daniel Barkalow
@ 2006-11-28  9:36                                 ` Andreas Ericsson
  2006-11-28 10:29                                   ` Andy Parkins
  2006-11-28 17:28                                   ` Daniel Barkalow
  0 siblings, 2 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-11-28  9:36 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Yann Dirson, Steven Grimm, git

Daniel Barkalow wrote:
> On Sat, 25 Nov 2006, Linus Torvalds wrote:
> 
>> On Sun, 26 Nov 2006, Yann Dirson wrote:
>>> Also, I'd rather expect "git-commit -a" outside of any submodule to
>>> commit everything in the supermodule, triggering submodule commits as an
>>> intermediate step when needed - just like "git-commit -a" does not
>>> require to manually specify subdirectories to inclue in the commit.  I'd
>>> rather expect a special flag to exclude submodules from a commit.
>> So, how do you do commit messages? It generally doesn't make sense to 
>> share the same commit message for submodules - the sub-commits generally 
>> do different things.
> 
> The same way you do the first commit message. Ask independantly for each 
> commit message in sequence with enough context in the comment section that 
> you know what you're talking about.
> 
>> I'd actually suggest that "git commit -a" with non-clean submodules error 
>> out for that reason, with something like
>>
>> 	submodule 'src/xyzzy' is not up-to-date, please commit changes to 
>> 	that first.
>>
>> exactly because you really generally should consider the submodule commits 
>> to be a separate phase.
> 
> I think this is getting close to the classic usability blunder of having 
> the program tell you what you should have done instead of what you did, 
> and then making you do it yourself, rather than just doing it.
> 
> Just have it run "git commit -a" in each dirty submodule recursively as 
> part of preparing the index, since that's what the user wants to do 
> anyway, and nothing already done would be affected.
> 

Running "commit -a" is definitely the wrong thing to do, as it prevents 
one from using the index at all. Erroring out if the submodules are 
dirty, or just accepting the fact that they are and taking whatever 
commit HEAD points to is *always* preferrable.

I'd actually prefer the second solution here and let git print a list of 
submodules with dirty state and ask for some sort of user-response 
before creating the actual commit. As non-interactive commits should 
always be clean, requiring user intervention on non-clean state should 
be a safe thing to do.

> "git commit -a -m <message>" should probably fail, of course.
> 

Why? There's no reason to rob this command of its power just because 
we're using submodules.

> 	-Daniel
> *This .sig left intentionally blank*
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28  9:36                                 ` Andreas Ericsson
@ 2006-11-28 10:29                                   ` Andy Parkins
  2006-11-28 10:50                                     ` Jakub Narebski
  2006-11-28 17:28                                   ` Daniel Barkalow
  1 sibling, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-11-28 10:29 UTC (permalink / raw)
  To: git

On Tuesday 2006 November 28 09:36, Andreas Ericsson wrote:

> I'd actually prefer the second solution here and let git print a list of
> submodules with dirty state and ask for some sort of user-response
> before creating the actual commit. As non-interactive commits should
> always be clean, requiring user intervention on non-clean state should
> be a safe thing to do.

I'd agree.  However, is there a need to require user intervention?  Can we not 
make the following analogies to normal git operation:

file in working directory -> submodule working directory
file in index -> submodule repository

It's perfectly possible to make a commit with different contents in the index 
and the working directory - it shows up in the git-status output very nicely.  
Why not deal with submodules in the same way?

Now imagine the following repository:
  file1
  file2
  submodule1/file3
Make changes to file2 and file3, but don't update-index or commit.  git-status 
would show:

# Changed but not updated:
#   (use git-update-index to mark for commit)
#
#       modified:   file2
#       dirty:      submodule1
#
nothing to commit

Now "git-update-index file2" and git-status

# Updated but not checked in:
#   (will commit)
#
#       modified:   file2
#
# Changed but not updated:
#   (use git-update-index to mark for commit)
#
#       dirty:      submodule1/
#

Now do a commit in submodule1/ and git-status in the supermodule.

# Updated but not checked in:
#   (will commit)
#
#       modified:   file2
#       submodule:  submodule1/
#

Obviously the detail would be different, but you get the idea.  There is 
almost no difference between git-with-submodules and git-as-normal.

I suppose there would actually need to be an extra step were the submodule is 
added to the supermodule index.  So really there would be three states from 
git-status:

# Updated but not checked in:
#   (will commit)
#
#       modified:   file2
#
# Changed but not updated:
#   (use git-update-index to mark for commit)
#
#       modified:   file1
#       submodule:  submodule1/
#
# Dirty submodules:
#   (commit changes in the submodule to clean)
#
#       dirty:      submodule1/
#

Which means: since the last supermodule commit there has been
 * a change to file2, which is in the index and would be committed.
 * a change to file1, which is not in the index and won't be committed.
 * a commit to submodule1, which won't be committed
 * changes to the submodule working directory

This really reinforces Linus's interpretation that submodules are 
directories - they would presumably just get a new object type and be 
referenced in the tree object.  git-update-index would be blind to dirty 
submodules with no new commit, just as git-update-index on an unchanged file 
has no effect.

Has this question been answered yet?  How does the supermodule know which 
branch to track in the submodule?  Does it simply track HEAD or when the 
submodule is added to the supermodule is it told which branch to track?  I 
suppose it's got to be HEAD really hasn't it?

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 10:29                                   ` Andy Parkins
@ 2006-11-28 10:50                                     ` Jakub Narebski
  2006-11-28 13:35                                       ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-11-28 10:50 UTC (permalink / raw)
  To: git

Andy Parkins wrote:

>                                  How does the supermodule know which 
> branch to track in the submodule?  Does it simply track HEAD or when the 
> submodule is added to the supermodule is it told which branch to track?  I 
> suppose it's got to be HEAD really hasn't it?

I think that the proper place for that would be supermodule _index_.
The supermodule tree would have commit entry, and the index would have
symbolic branch (and perhaps some infor about where to find refs for
submodule).

This I guess breaks index abstraction slightly, but on the other hand
allows for tracking non-HEAD branch of submodule, and for submodule to
not know about supermodule at all...
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 10:50                                     ` Jakub Narebski
@ 2006-11-28 13:35                                       ` Andy Parkins
  2006-11-28 15:44                                         ` Shawn Pearce
                                                           ` (2 more replies)
  0 siblings, 3 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-28 13:35 UTC (permalink / raw)
  To: git

On Tuesday 2006 November 28 10:50, Jakub Narebski wrote:

> I think that the proper place for that would be supermodule _index_.
> The supermodule tree would have commit entry, and the index would have
> symbolic branch (and perhaps some infor about where to find refs for
> submodule).
>
> This I guess breaks index abstraction slightly, but on the other hand
> allows for tracking non-HEAD branch of submodule, and for submodule to
> not know about supermodule at all...

The reason I thought it would have to be HEAD at all times, is to prevent 
situations where the supermodule commit doesn't reflect the state of the 
current tree.

Let's imagine that we're doing non-HEAD tracking in the supermodule.
  supermodule
   +-------- libsubmodule1
   +-------- libsubmodule2
So, you do a "make" in supermodule; this of course will call make in each of 
the submodules.  You test the output and find that it's all working nicely.  
Time for a supermodule commit.  We want to freeze this working state.  You 
commit and tag "supermodule-rc1"

Unfortunately, during development, you've switched libsubmodule1 to 
branch "development", but supermodule isn't tracking libsubmodule1/HEAD it's 
tracking libsubmodule1/master.  Your supermodule commit doesn't capture a 
snapshot of the tree you're using.

Now you say to the mailing list "hey guys, can you test "supermodule-rc1"?  
They check it out, and find that everything is broken.  Why?  Because what 
you wanted to check in was libsubmodule@development, but what actually went 
in was libsubmodule@master.

I think I've talked myself into the position where it definitely has to be 
HEAD being tracked in the submodules; anything else is a disaster waiting to 
happen because commit doesn't check in your current tree.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 13:35                                       ` Andy Parkins
@ 2006-11-28 15:44                                         ` Shawn Pearce
  2006-11-28 16:29                                           ` Andy Parkins
  2006-11-28 19:58                                         ` Steven Grimm
  2006-11-29 16:03                                         ` Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Shawn Pearce @ 2006-11-28 15:44 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins <andyparkins@gmail.com> wrote:
> I think I've talked myself into the position where it definitely has to be 
> HEAD being tracked in the submodules; anything else is a disaster waiting to 
> happen because commit doesn't check in your current tree.

Yes, but not only that, HEAD is the only thing that fits with the
rest of the git repository/index/working directory model.

Lets review...

What's HEAD?  Its the commit which matches the index state as
closely as possible, with the only differences being the changes in
progress that are being prepared for the next commit (whose parent
will be HEAD).  If the index and working directory are both clean
(no changes) then its also the current content of this directory,
right?

What's the index?  Its what you are about to commit.

What's the working directory?  Its the current content, which may
also be partially checked out or dirty.

So HEAD in a submodule is the current content of that submodule.
Therefore any update-index call on a submodule should load HEAD
(totally ignoring whatever branch it refers to) into the supermodule
index.

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 15:44                                         ` Shawn Pearce
@ 2006-11-28 16:29                                           ` Andy Parkins
  2006-11-28 16:36                                             ` Shawn Pearce
                                                               ` (3 more replies)
  0 siblings, 4 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-28 16:29 UTC (permalink / raw)
  To: git; +Cc: Shawn Pearce

On Tuesday 2006 November 28 15:44, Shawn Pearce wrote:

> So HEAD in a submodule is the current content of that submodule.
> Therefore any update-index call on a submodule should load HEAD
> (totally ignoring whatever branch it refers to) into the supermodule
> index.

I was with you right up until here.

Why should a submodule do anything to the supermodule?  This is like saying, 
when I edit a working tree file, it should automatically call update-index.  
The supermodule index should only be updated in response to a manual 
update-index (or commit -a I suppose).

Worse, if you allow that to happen, the supermodule can commit a state that 
cannot be retrieved from the submodule's repository.  The ONLY thing a 
supermodule can record about a submodule is a commit.  Changing the index 
doesn't create a commit, so it can't change anything in the supermodule.

If you change the submodule index then that submodule is "dirty", this state 
has no parallel with normal git operation.  The nearest thing is that you've 
changed a file but not saved it.  Apart from showing the "dirty" state in the 
supermodule's git-status, I don't see that there is anything that the 
supermodule can do - it can't go around committing in a repository that it 
not itself.

IMO, it should always be possible to take a submodule and work on it in 
isolation - in an extreme case, by moving it out of the supermodule tree 
entirely.

In summary, from the supermodule's point of view:
 * A submodule with changed working directory is "dirty-wd"
 * A submodule with changed index is "dirty-idx" from the supermodule's
 * A submodule with changed HEAD (since the last supermodule commit) 
   is "changed but not updated" and can hence be "update-index"ed into the
   supermodule
 * A submodule with changed HEAD that has been added to the supermodule index
   is "updated but not checked in"
 * A submodule with changed HEAD (since the last supermodule update-index) is
   both "changed but not updated" _and_ "updated but not checked in", just 
   like any normal file.

What's needed then:
 * A way of telling git to treat a particular directory as a submodule instead
   of a directory
 * git-status gets knowledge of how to check for "dirty" submodules
 * git-commit-tree learns about how to store "submodule" object types in
   trees.  The submodule object type will be nothing more than the hash of the
   current HEAD commit.  (This might be my ignorance, perhaps it's just 
   update-index that needs to know this)

I don't know enough about the plumbing to know if my description above is 
using the right nomenclature - I'm sure someone will correct me.

In my head, it would look something like this:

$ mkdir supermodule; cd supermodule
$ git init-db
$ git clone proto://host/submodule.git
$ git add --submodule submodule
$ git update-index submodule
$ git commit -m "Added submodule to supermodule"
[ edit submodule ]
$ git status
submodule is dirty, the working directory has changed
[ update-index in submodule ]
$ git status
submodule is dirty, the index has changed
[ commit in submodule ]
$ git status
submodule is changed but not updated
$ git update-index submodule
$ git status
submodule is updated but not checked in
$ git commit -m "Record submodule change in supermodule"

Am I crazy?

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 16:29                                           ` Andy Parkins
@ 2006-11-28 16:36                                             ` Shawn Pearce
  2006-11-28 17:38                                             ` Jon Loeliger
                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 252+ messages in thread
From: Shawn Pearce @ 2006-11-28 16:36 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins <andyparkins@gmail.com> wrote:
> On Tuesday 2006 November 28 15:44, Shawn Pearce wrote:
> 
> > So HEAD in a submodule is the current content of that submodule.
> > Therefore any update-index call on a submodule should load HEAD
> > (totally ignoring whatever branch it refers to) into the supermodule
> > index.
> 
> I was with you right up until here.
> 
> Why should a submodule do anything to the supermodule?  This is like saying, 
> when I edit a working tree file, it should automatically call update-index.  
> The supermodule index should only be updated in response to a manual 
> update-index (or commit -a I suppose).

You misread my poorly written statement.  :-)

What I meant to say was that update-index run in the supermodule
would load the submodule content into the supermodule index; much
as an update-index on a file would load the content of that file
into the index.
 
> IMO, it should always be possible to take a submodule and work on it in 
> isolation - in an extreme case, by moving it out of the supermodule tree 
> entirely.

Aside from sharing object directories, yes.
 
> In summary, from the supermodule's point of view:
>  * A submodule with changed working directory is "dirty-wd"
>  * A submodule with changed index is "dirty-idx" from the supermodule's
>  * A submodule with changed HEAD (since the last supermodule commit) 
>    is "changed but not updated" and can hence be "update-index"ed into the
>    supermodule
>  * A submodule with changed HEAD that has been added to the supermodule index
>    is "updated but not checked in"
>  * A submodule with changed HEAD (since the last supermodule update-index) is
>    both "changed but not updated" _and_ "updated but not checked in", just 
>    like any normal file.
> 
> What's needed then:
>  * A way of telling git to treat a particular directory as a submodule instead
>    of a directory
>  * git-status gets knowledge of how to check for "dirty" submodules
>  * git-commit-tree learns about how to store "submodule" object types in
>    trees.  The submodule object type will be nothing more than the hash of the
>    current HEAD commit.  (This might be my ignorance, perhaps it's just 
>    update-index that needs to know this)

Err, uhm, more like git-write-tree.  git-commit-tree doesn't
care about the tree content.  And all of the tree reading code.
And all object traversal code (e.g. rev-list --objects).  Martin
Waitz's submodule prototype has been working on those details.
Its non-trivial due to the number of locations affected.

> In my head, it would look something like this:
> 
> $ mkdir supermodule; cd supermodule
> $ git init-db
> $ git clone proto://host/submodule.git
> $ git add --submodule submodule
> $ git update-index submodule
> $ git commit -m "Added submodule to supermodule"
> [ edit submodule ]
> $ git status
> submodule is dirty, the working directory has changed
> [ update-index in submodule ]
> $ git status
> submodule is dirty, the index has changed
> [ commit in submodule ]
> $ git status
> submodule is changed but not updated
> $ git update-index submodule
> $ git status
> submodule is updated but not checked in
> $ git commit -m "Record submodule change in supermodule"

Yes, exactly my thoughts on the matter.
 
> Am I crazy?

Maybe, but I'm not a shrink.  Your email looked sane.  :-)

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28  9:36                                 ` Andreas Ericsson
  2006-11-28 10:29                                   ` Andy Parkins
@ 2006-11-28 17:28                                   ` Daniel Barkalow
  2006-11-28 18:08                                     ` Sven Verdoolaege
  1 sibling, 1 reply; 252+ messages in thread
From: Daniel Barkalow @ 2006-11-28 17:28 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Linus Torvalds, Yann Dirson, Steven Grimm, git

On Tue, 28 Nov 2006, Andreas Ericsson wrote:

> Daniel Barkalow wrote:
> > On Sat, 25 Nov 2006, Linus Torvalds wrote:
> > > I'd actually suggest that "git commit -a" with non-clean submodules error
> > > out for that reason
> > 
> > Just have it run "git commit -a" in each dirty submodule recursively as part
> > of preparing the index, since that's what the user wants to do anyway, and
> > nothing already done would be affected.
> > 
> 
> Running "commit -a" is definitely the wrong thing to do, as it prevents one
> from using the index at all. Erroring out if the submodules are dirty, or just
> accepting the fact that they are and taking whatever commit HEAD points to is
> *always* preferrable.

I don't think anyone would actually use the index in submodules but not in 
the supermodule. If submodules are seen mostly as ordinary directories as 
far as the supermodule's working directory is concerned, it wouldn't make 
sense to not commit dirty state in a subdirectory with -a just because 
it's a submodule.

It would be wrong to do "commit -a" in submodules if the supermodule 
weren't being committed with -a, of course.

> > "git commit -a -m <message>" should probably fail, of course.
> > 
> 
> Why? There's no reason to rob this command of its power just because we're
> using submodules.

It should fail if there are dirty submodules, because the user needs to 
provide a commit message for each of them, and only one commit message can 
be provided this way, and -m inhibits invoking an editor.

	-Daniel

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 16:29                                           ` Andy Parkins
  2006-11-28 16:36                                             ` Shawn Pearce
@ 2006-11-28 17:38                                             ` Jon Loeliger
  2006-11-29 16:15                                             ` Martin Waitz
  2006-11-30 11:57                                             ` sf
  3 siblings, 0 replies; 252+ messages in thread
From: Jon Loeliger @ 2006-11-28 17:38 UTC (permalink / raw)
  To: Andy Parkins; +Cc: Git List, Shawn Pearce

On Tue, 2006-11-28 at 10:29, Andy Parkins wrote:

> IMO, it should always be possible to take a submodule and work on it in 
> isolation - in an extreme case, by moving it out of the supermodule tree 
> entirely.

This seems to me to be tantamount to saying something like:

    We need a "recursively defined git repository" that
    is representable as a git repository.

That is, can the tree object be changed from containing
just "blob" and "tree" references to also having a new
"git" reference as well?

jdl


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 17:28                                   ` Daniel Barkalow
@ 2006-11-28 18:08                                     ` Sven Verdoolaege
  2006-11-28 18:37                                       ` Daniel Barkalow
  0 siblings, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-11-28 18:08 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git

On Tue, Nov 28, 2006 at 12:28:47PM -0500, Daniel Barkalow wrote:
> It would be wrong to do "commit -a" in submodules if the supermodule 
> weren't being committed with -a, of course.

What if you say "git commit submodule" ?
I sure hope you wouldn't want to do a "commit -a" in the submodule.
One of the nice features of git is that you can still perform most
operations if you have a dirty state and I would very much want to
be able to commit only some changes in the submodule and then only
commit that change in submodule commits in the supermodule without
having my other changes in the submodule committed as well.

If you agree with the above, then why should "git commit -a"
do any different from "git commit submodule" if submodule was
the only thing that got changed ?

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 18:08                                     ` Sven Verdoolaege
@ 2006-11-28 18:37                                       ` Daniel Barkalow
  2006-11-28 19:06                                         ` Sven Verdoolaege
  0 siblings, 1 reply; 252+ messages in thread
From: Daniel Barkalow @ 2006-11-28 18:37 UTC (permalink / raw)
  To: skimo; +Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git

On Tue, 28 Nov 2006, Sven Verdoolaege wrote:

> On Tue, Nov 28, 2006 at 12:28:47PM -0500, Daniel Barkalow wrote:
> > It would be wrong to do "commit -a" in submodules if the supermodule 
> > weren't being committed with -a, of course.
> 
> What if you say "git commit submodule" ?

Obviously no -a, as I said.

> If you agree with the above, then why should "git commit -a"
> do any different from "git commit submodule" if submodule was
> the only thing that got changed ?

If submodule was the only thing that got changed, it's not dirty; if it 
were dirty, some of its contents would also have gotten changed. Surely:

"git commit submodule/foo bar"

should do "git commit foo" in submodule, and then commit the supermodule 
with the new commit for the submodule and the change to bar. And so
"submodule/foo" is something you could commit changes to, so it should get 
picked up by -a.

Of course, if submodule *is* the *only* thing that changed (e.g., you did 
a fast-forward merge in it, or you've previously committed it completely), 
there won't be a "commit -a" in it, because that would just generate a 
gratuitous commit.

	-Daniel

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 18:37                                       ` Daniel Barkalow
@ 2006-11-28 19:06                                         ` Sven Verdoolaege
  2006-11-28 20:41                                           ` Daniel Barkalow
  0 siblings, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-11-28 19:06 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git

On Tue, Nov 28, 2006 at 01:37:54PM -0500, Daniel Barkalow wrote:
> If submodule was the only thing that got changed, it's not dirty; if it 
> were dirty, some of its contents would also have gotten changed.

For me, the commit is the only "content" of the subproject that the
superproject should care about, so the submodule being dirty or not
is completely irrelevant (for committing), but it seems you see the
subproject more as a (working) tree than as a commit. Of course, as
Linus already mentioned, a "git commit" could still warn you if the
subproject was dirty.

> Surely:
> 
> "git commit submodule/foo bar"

I wouldn't dream of doing such an operation, because it doesn't make
sense to me.  (So as far as I'm concerned, you can make it do whatever
you'd like it to do.)  You can only commit the subproject as a whole.

> should do "git commit foo" in submodule, and then commit the supermodule 
> with the new commit for the submodule and the change to bar. And so
> "submodule/foo" is something you could commit changes to, so it should get 
> picked up by -a.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 13:35                                       ` Andy Parkins
  2006-11-28 15:44                                         ` Shawn Pearce
@ 2006-11-28 19:58                                         ` Steven Grimm
  2006-11-28 21:02                                           ` Shawn Pearce
  2006-11-29 16:03                                         ` Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Steven Grimm @ 2006-11-28 19:58 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins wrote:
> Unfortunately, during development, you've switched libsubmodule1 to 
> branch "development", but supermodule isn't tracking libsubmodule1/HEAD it's 
> tracking libsubmodule1/master.  Your supermodule commit doesn't capture a 
> snapshot of the tree you're using.
>   

How about if the supermodule commit errors out by default if you commit 
a different submodule branch than the one you committed the previous 
time? Require the user to explicitly acknowledge that yes, they want to 
check in the contents of "development" now, even though the supermodule 
was tracking "master" before.

Otherwise I think you could easily end up with just the opposite 
situation, where you forget you've checked out "development" for a 
moment to look at something, and end up inadvertently committing a bunch 
of stuff that's not ready for prime time yet. In a standalone git 
setting, that's no big deal since the commit only updates the current 
branch and doesn't touch the master branch, but (as I understand the 
proposal) in a supermodule setting you'd actually end up essentially 
doing a merge between your development branch and the previously 
committed master. Or maybe not a merge, but worse, you'd *replace* the 
previously committed master with what's in your dev branch.

I think wanting to commit a submodule on a different branch than last 
time is probably not a typical day-to-day use case, so we should make 
sure the user really wants to do it (but allow it if so.)

On a related note, it would be great from a usability point of view if 
there were a way to say "I always want to be on the same branch in all 
submodules and the supermodule." I think a common scenario will be that 
you are doing development that touches a couple of different 
applications and your development effort is really a single set of 
changes even though it happens to cross submodule boundaries. If this 
branches-in-sync option is turned on, I'd want "git checkout 
development" to check out the development branch in the entire set of 
repositories.

More generally, while I 100% agree that it's very useful to be able to 
operate independently on each submodule, I think it's also going to be 
common to use submodules to selectively clone different pieces of a 
larger project. Say your current development effort needs server A, 
library B, and documentation C, and you want to have *just* those pieces 
in your environment. You don't particularly care about the details of 
how the system has assembled the pieces you want; you want to be able to 
make your changes and push them when you're done. They are really just 
pieces of a larger code base, not independent entities that happen to be 
pulled together into a composite workspace temporarily.

For that use case, I don't want the system to act differently depending 
on whether server A and library B are in the same submodule or separate 
ones; I want to treat the supermodule as the repository, and the system 
should take care of the details of managing the submodules. When I do 
"git commit -a" I want it to give me one editor to write one commit 
comment that covers all of the changes I've made, and when I do "git 
checkout -b" I want a new branch to apply across all the files I'm 
working with.

It is entirely possible that the above is a matter best left to the 
porcelain layer, and that's fine with me. But I think the Perforce-style 
"compose a single workspace out of different bits of a larger project" 
model is hugely useful and whatever submodule system Git ends up with, 
it should be able to emulate as much of that feature as possible.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 19:06                                         ` Sven Verdoolaege
@ 2006-11-28 20:41                                           ` Daniel Barkalow
  2006-11-28 21:10                                             ` Shawn Pearce
  0 siblings, 1 reply; 252+ messages in thread
From: Daniel Barkalow @ 2006-11-28 20:41 UTC (permalink / raw)
  To: skimo; +Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git

On Tue, 28 Nov 2006, Sven Verdoolaege wrote:

> On Tue, Nov 28, 2006 at 01:37:54PM -0500, Daniel Barkalow wrote:
> > If submodule was the only thing that got changed, it's not dirty; if it 
> > were dirty, some of its contents would also have gotten changed.
> 
> For me, the commit is the only "content" of the subproject that the
> superproject should care about, so the submodule being dirty or not
> is completely irrelevant (for committing), but it seems you see the
> subproject more as a (working) tree than as a commit.

I think we agree on the tree/commit/object database model part.

I think we disagree on how the working *directories* relate. I see the 
checked-out state of a submodule as being relevant to the checked-out 
state of the supermodule, such that dirty state in the submodule directory 
is dirty state in the supermodule directory.

> > Surely:
> > 
> > "git commit submodule/foo bar"
> 
> I wouldn't dream of doing such an operation, because it doesn't make
> sense to me.  (So as far as I'm concerned, you can make it do whatever
> you'd like it to do.)  You can only commit the subproject as a whole.

I'm thinking that users of subprojects will often want to work on the
subprojects rather than exclusively using commits prepared by other 
people, and it's too much trouble to have to do the work in a repository 
for just the subproject and pull it into the superproject's submodule to 
test it. So the submodule working directory needs to function as a working 
directory for the subproject. Then

  "cd submodule; git commit foo"

does the obvious thing, but that should be the same as

  "git commit submodule/foo" (since it normally is)

and then it makes sense to let you do multiple commits with a single 
command when the paths end in different modules, since that's obviously 
what you're requesting, and then -a must do all of them.

	-Daniel

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 19:58                                         ` Steven Grimm
@ 2006-11-28 21:02                                           ` Shawn Pearce
  0 siblings, 0 replies; 252+ messages in thread
From: Shawn Pearce @ 2006-11-28 21:02 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Andy Parkins, git

Steven Grimm <koreth@midwinter.com> wrote:
> Andy Parkins wrote:
> >Unfortunately, during development, you've switched libsubmodule1 to 
> >branch "development", but supermodule isn't tracking libsubmodule1/HEAD 
> >it's tracking libsubmodule1/master.  Your supermodule commit doesn't 
> >capture a snapshot of the tree you're using.
> >  
>
> Or maybe not a merge, but worse, you'd *replace* the 
> previously committed master with what's in your dev branch.

Right, you would be replacing the prior branch of that submodule with
the new submodule branch.

I think the safety valve you are looking for here is two things:

  * don't automatically update the submodule's HEAD into the
    supermodule's index.

  * make sure the submodule's HEAD is a fast-forward of the
    supermodule's index, with a --force option to force it
	anyway.

Otherwise the developer just has to know what he/she is doing.
Today you can put stuff that isn't ready for prime-time into a
repository on the wrong branch just by applying the wrong patch,
or cherry-picking the wrong commit, etc...  the user can (and
will) make mistakes.  But they can also easily recover from them
by rewinding history and redoing it.

> On a related note, it would be great from a usability point of view if 
> there were a way to say "I always want to be on the same branch in all 
> submodules and the supermodule."

That's not really an issue.

A branch doesn't exist just because you checked-out the branch, or
because you created it.  A branch exists because there were two or
more commits (B and C) which use the same parent (A) and two or more
of those commits survive, e.g. they have refs which point to them
(directly or indirectly) or they were merged into another commit
which itself survives.

Therefore if the supermodule is on the "development branch" the
submodules are also immediately on the same branch, because their
HEADs are derived from whatever is stored in the supermodule's tree.
And that tree is derived from whatever "development branch" means.

Really what you want/need is a special head in the submodule
which acts as the "branch that corresponds to the supermodule".
This probably should just be a naked SHA1 stored in HEAD, which
is committable only because a supermodule exists in a higher level
directory.

The fact that the submodule project has branches *at all* is
totally irrelevant once you start to speak about that submodule
within the supermodule, as its the supermodule which determines
the branch of the submodule.

> But I think the Perforce-style 
> "compose a single workspace out of different bits of a larger project" 
> model is hugely useful

That's a mess.

You start to get into weird cases where the directory structure
expected by the build process is no longer intact, because the user
has sliced it apart in weird ways.  And there's no single version
which corresponds to that workspace as (if I recall correctly)
you can pick different tags or branches at will.  I believe that
ClearCase has the same bug.

You also can't version that now spliced workspace, aside from taking
the configuration file and putting that under version control too.

However I think the proposal on the table will support that to some
degree, in that you can take any version of any repository and embed
it at any directory of any other repository.  This means you can
for example embed the Linux kernel, glibc and gcc projects into
a larger "embedded device" repository, but you cannot alter the
structure of any of those three projects without making your own
locally developed branch of them.  Which is actually the correct
thing to do as any subslicing of a repository is exactly that:
a locally developed branch of that repository.

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 20:41                                           ` Daniel Barkalow
@ 2006-11-28 21:10                                             ` Shawn Pearce
  2006-11-28 21:32                                               ` Daniel Barkalow
  0 siblings, 1 reply; 252+ messages in thread
From: Shawn Pearce @ 2006-11-28 21:10 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: skimo, Andreas Ericsson, Linus Torvalds, Yann Dirson,
	Steven Grimm, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
>   "cd submodule; git commit foo"
> 
> does the obvious thing, but that should be the same as
> 
>   "git commit submodule/foo" (since it normally is)
> 
> and then it makes sense to let you do multiple commits with a single 
> command when the paths end in different modules, since that's obviously 
> what you're requesting, and then -a must do all of them.

Except what if the submodules have different commit message
standards?  E.g. one requires signoff and another doesn't?  Or one
allows privately held information (e.g. its your coporate project)
and one doesn't (e.g. its an open source project you use/contribute
to)?

But slightly more practical: the change message for the superproject
might simply be "resolved bug X, caused by ...".  Which may make a
lot of sense to the top level project, but makes no sense at all
in a submodule involved in the fix as the submodule's developer
community doesn't even know what "X" is, let alone how "..." could
have caused it.

So you really need to think twice before you apply the same commit
message to every project, as each commit message needs to make sense
with that one submodule's limited scope, or within the supermodule's
larger scope.

But if you really still think that the same commit message makes
sense everywhere, we have 'git commit -F'.  Write it out in a file
and hand it off to -F in each module.  This would be easier if
git-ls-files grew a new option:

	vi ~/msg
	for m in $(git ls-files --submodules); do git commit -F ~/msg; done
	git commit -F ~/msg

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 21:10                                             ` Shawn Pearce
@ 2006-11-28 21:32                                               ` Daniel Barkalow
  2006-11-28 21:53                                                 ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Daniel Barkalow @ 2006-11-28 21:32 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: skimo, Andreas Ericsson, Linus Torvalds, Yann Dirson,
	Steven Grimm, git

On Tue, 28 Nov 2006, Shawn Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > and then it makes sense to let you do multiple commits with a single 
> > command when the paths end in different modules, since that's obviously 
> > what you're requesting, and then -a must do all of them.
> 
> Except what if the submodules have different commit message
> standards?  E.g. one requires signoff and another doesn't?  Or one
> allows privately held information (e.g. its your coporate project)
> and one doesn't (e.g. its an open source project you use/contribute
> to)?

I don't think you'd ever want the same commit message for commits in two 
projects. In any case where you'd commit a submodule in the process of 
committing a supermodule, git would do this by recursively calling 
git-commit, which would prompt for separate commit messages.

	-Daniel

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 21:32                                               ` Daniel Barkalow
@ 2006-11-28 21:53                                                 ` Linus Torvalds
  0 siblings, 0 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-11-28 21:53 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Shawn Pearce, skimo, Andreas Ericsson, Yann Dirson, Steven Grimm,
	git

On Tue, 28 Nov 2006, Daniel Barkalow wrote:
> 
> I don't think you'd ever want the same commit message for commits in two 
> projects.

I don't know about "ever", but yes, I do think submodule commits are 
generally totally separate things from supermodule commits.

> In any case where you'd commit a submodule in the process of 
> committing a supermodule, git would do this by recursively calling 
> git-commit, which would prompt for separate commit messages.

That certainly works, although I'm not convinved that it's necessarily a 
hugely important detail.

I suspect there may well be more important things UI-wise wrt submodules 
than the "you may have to commit submodules separately" question.

For example, doing a "git pull" is a lot more interesting, since that 
actually has the potential of having to resolve conflicts in submodules 
before the supermodule can be committed. Getting all the "git reset" 
behaviour right for when you decide "oops, that was too complicated" is 
probably a lot more important than whether you have to have a separate 
"commit subproject" phase for the simple cases of doing a bog-standard 
"git commit -a".

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 13:35                                       ` Andy Parkins
  2006-11-28 15:44                                         ` Shawn Pearce
  2006-11-28 19:58                                         ` Steven Grimm
@ 2006-11-29 16:03                                         ` Martin Waitz
  2006-11-29 20:00                                           ` Andy Parkins
  2 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-11-29 16:03 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]

hoi :)

On Tue, Nov 28, 2006 at 01:35:37PM +0000, Andy Parkins wrote:
> The reason I thought it would have to be HEAD at all times, is to prevent 
> situations where the supermodule commit doesn't reflect the state of the 
> current tree.

The way I wanted to address this is to show in the supermodule
git-status that the submodule is using another branch.
That way you are warned and can decide not to commit the supermodule.

I implemented tracking of refs/heads/master (not HEAD) without much
thinking, and only recently began to think about possible problems with
this approach.

But I think it is an important design decision to take, so I'd like to
have consensus here.

Pro HEAD:
 - update-index on submodule really updates the supermodule index with
   a commit that resembles the working directory.
Contra HEAD:
 - HEAD is not garanteed to be equal to the working directory anyway,
   you may have uncommitted changes.
 - when updating the supermodule, you have to take care that your
   submodules are on the right branch.
   You might for example have some testing-throwawy branch in one
   submodule and don't want to merge it with other changes yet.

Pro refs/heads/master:
 - the supermodule really tracks one defined branch of development.
 - you can easily overwrite one submodule by changing to another branch,
   without fearing that changes in the supermodule change anything
   there.
Contra refs/heads/master:
 - after updating the supermodule, you may not have the correct working
   directory checked out everywhere, because some submodules may be on a
   different branch.
 - there is one branch in the submodule which is special to all the other.

I think that most of the disadvantages of refs/heads/master can be
solved by printing the above-mentioned warning in git-status when the
submodule is using another branch (similiar to the
planned-but-not-implemented warn if the submodule has uncommited
changes).

I don't yet know how to cope with tracking HEAD directly, so I'm still
in favor of tracking refs/heads/master, as already implemented.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 16:29                                           ` Andy Parkins
  2006-11-28 16:36                                             ` Shawn Pearce
  2006-11-28 17:38                                             ` Jon Loeliger
@ 2006-11-29 16:15                                             ` Martin Waitz
  2006-11-30 11:57                                             ` sf
  3 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-29 16:15 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Shawn Pearce

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

hoi :)

On Tue, Nov 28, 2006 at 04:29:05PM +0000, Andy Parkins wrote:
> In summary, from the supermodule's point of view:
>  * A submodule with changed working directory is "dirty-wd"
>  * A submodule with changed index is "dirty-idx" from the supermodule's
>  * A submodule with changed HEAD (since the last supermodule commit) 
>    is "changed but not updated" and can hence be "update-index"ed into the
>    supermodule
>  * A submodule with changed HEAD that has been added to the supermodule index
>    is "updated but not checked in"
>  * A submodule with changed HEAD (since the last supermodule update-index) is
>    both "changed but not updated" _and_ "updated but not checked in", just 
>    like any normal file.

when tracking refs/heads/master instead of HEAD, you also get:
   * A submodule where HEAD is not pointing to refs/heads/master is
     "dirty-branch" or something.


> What's needed then:
>  * A way of telling git to treat a particular directory as a submodule instead
>    of a directory
This is handled by creating a GIT repository in this directory.
My current implementation needs some more magic by the user to add it to
the index, but I plan to change this to the way that GIT repositories
will be recognized as possible submodules.

>  * git-status gets knowledge of how to check for "dirty" submodules
This is on top of my TODO.

>  * git-commit-tree learns about how to store "submodule" object types in
>    trees.  The submodule object type will be nothing more than the hash of the
>    current HEAD commit.  (This might be my ignorance, perhaps it's just 
>    update-index that needs to know this)
it's only update-index that has to know this.
Otherwise it would be implicitly updated and you would never get your
"changed but not updated" status as above.


-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-29 16:03                                         ` Martin Waitz
@ 2006-11-29 20:00                                           ` Andy Parkins
  2006-11-30 12:16                                             ` Andreas Ericsson
  2006-11-30 17:06                                             ` Martin Waitz
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-29 20:00 UTC (permalink / raw)
  To: git

On Wednesday 2006, November 29 16:03, Martin Waitz wrote:

> The way I wanted to address this is to show in the supermodule
> git-status that the submodule is using another branch.
> That way you are warned and can decide not to commit the supermodule.

The problem I see with tracking a particular branch is that it makes it less 
convenient to use git's quick-branching features in the submodules.  Let's 
say I want to try something out quickly in a submodule, I make a branch, 
commit, commit, "hmm, looks good, let's snapshot it in the supermodule", make 
a supermodule branch, "oh no, I've got to tell the supermodule to track the 
new (but temporary) branch in the submodule do a commit, switch the submodule 
branch back to master, delete the temporary branch, remember that the 
supermodule is tracking that branch and tell the supermodule to track 
something else instead...  It all seems too complicated to me.

> Pro HEAD:
>  - update-index on submodule really updates the supermodule index with
>    a commit that resembles the working directory.

Ouch.  Why does the submodule need to update the supermodule index?  That 
should be done by update-index in the supermodule.   Further, how is the 
supermodule index going to represent working directory changes in the 
submodule?  The only link between the two is a commit hash.  It has to be 
like that otherwise you haven't made a supermodule-submodule, you've just 
made one super-repository.  Also, if you don't store submodule commit hashes, 
then there is no way to guarantee that you're going to be able get back the 
state of the submodule again.

> Contra HEAD:
>  - HEAD is not garanteed to be equal to the working directory anyway,
>    you may have uncommitted changes.

That's the case for every file in a repository, so isn't really a worry.  It's 
the equivalent of changing a file and not updating the index - who cares?  As 
long as update-index tells you that the submodule is dirty and what to do to 
clean it, everything is great.

>  - when updating the supermodule, you have to take care that your
>    submodules are on the right branch.
>    You might for example have some testing-throwawy branch in one
>    submodule and don't want to merge it with other changes yet.

What is the "right" branch though?  As I said above, if you're tracking one 
branch in the submodule then you've effectively locked that submodule to that 
branch for all supermodule uses.  Or you've made yourself a big rod to beat 
yourself with everytime you want to do some development on an "off" branch on 
the submodule.

> Pro refs/heads/master:
>  - the supermodule really tracks one defined branch of development.

Why is this a pro?

>  - you can easily overwrite one submodule by changing to another branch,
>    without fearing that changes in the supermodule change anything
>    there.

You can always do that anyway by simply not running update-index for the 
submodule in the supermodule.

> Contra refs/heads/master:
>  - after updating the supermodule, you may not have the correct working
>    directory checked out everywhere, because some submodules may be on a
>    different branch.

This seems like the biggest problem to me - doesn't this negate all the 
advantages of a submodule system?  After a check in, you have no idea if what 
you checked in was what was in your working tree.

Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-28 16:29                                           ` Andy Parkins
                                                               ` (2 preceding siblings ...)
  2006-11-29 16:15                                             ` Martin Waitz
@ 2006-11-30 11:57                                             ` sf
       [not found]                                               ` <200611301255.41733.andyparkins@gmail.com>
  3 siblings, 1 reply; 252+ messages in thread
From: sf @ 2006-11-30 11:57 UTC (permalink / raw)
  To: git; +Cc: Shawn Pearce

Andy Parkins wrote:
...
> Worse, if you allow that to happen, the supermodule can commit a state that 
> cannot be retrieved from the submodule's repository.  The ONLY thing a 
> supermodule can record about a submodule is a commit.

So what? You have a submodule commit that only exists in the 
supermodule. I fail to see the problem. The changes you made to the 
submodule _in the supermodule_ can later be pulled from wherever you want.

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-29 20:00                                           ` Andy Parkins
@ 2006-11-30 12:16                                             ` Andreas Ericsson
  2006-11-30 12:40                                               ` Andy Parkins
  2006-11-30 17:06                                             ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-11-30 12:16 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins wrote:
> On Wednesday 2006, November 29 16:03, Martin Waitz wrote:
> 
  >>  - when updating the supermodule, you have to take care that your
>>    submodules are on the right branch.
>>    You might for example have some testing-throwawy branch in one
>>    submodule and don't want to merge it with other changes yet.
> 
> What is the "right" branch though?  As I said above, if you're tracking one 
> branch in the submodule then you've effectively locked that submodule to that 
> branch for all supermodule uses.  Or you've made yourself a big rod to beat 
> yourself with everytime you want to do some development on an "off" branch on 
> the submodule.
> 

Perhaps I'm just daft, but I fail to see how you can safely track a 
topic-branch that might get rewinded or rebased in the submodule without 
crippling the supermodule. Wasn't the intention that the supermodule has 
a new tree object (called "submodule") that points to a commit in the 
submodule from where it gets its tree and stuff? Is the intention that 
the supermodule pulls all of the submodules history into its own ODB? If 
so, what's the difference between just having one large repository. If 
not, how can you make it not break in case the commit it references in 
the submodule is pruned away?

One possible way would ofcourse be to add something like this to the 
supermodule commit:
submodule directory/commit-sha1
tree submodule-tree-sha1

but then you're in trouble because the supermodule will have the same 
files as all the submodules stored in its own tree. I'm confused. Could 
someone shed some light on how this sub-/super-module connection is 
supposed to work in the supermodule's commit objects?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 12:16                                             ` Andreas Ericsson
@ 2006-11-30 12:40                                               ` Andy Parkins
  0 siblings, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-30 12:40 UTC (permalink / raw)
  To: git

On Thursday 2006 November 30 12:16, Andreas Ericsson wrote:

> > What is the "right" branch though?  As I said above, if you're tracking
> > one branch in the submodule then you've effectively locked that submodule
> > to that branch for all supermodule uses.  Or you've made yourself a big
> > rod to beat yourself with everytime you want to do some development on an
> > "off" branch on the submodule.
>
> Perhaps I'm just daft, but I fail to see how you can safely track a
> topic-branch that might get rewinded or rebased in the submodule without
> crippling the supermodule. Wasn't the intention that the supermodule has

Who said anything but rebase/rewind?  As it happens though, I don't see why 
you can't (it wouldn't be pleasant though).  A rebase or rewind still leaves 
the original commit in the object database, so provided no one runs 
git-prune, there is no catastrophic failure.

> a new tree object (called "submodule") that points to a commit in the
> submodule from where it gets its tree and stuff? Is the intention that
> the supermodule pulls all of the submodules history into its own ODB? If
> so, what's the difference between just having one large repository. If
> not, how can you make it not break in case the commit it references in
> the submodule is pruned away?

I certainly never suggested anything /but/ storing a submodule type that 
stores the commit.  The current debate is about whether the supermodule 
should track HEAD or some defined branch in the submodule.

> but then you're in trouble because the supermodule will have the same
> files as all the submodules stored in its own tree. I'm confused. Could
> someone shed some light on how this sub-/super-module connection is
> supposed to work in the supermodule's commit objects?

I don't really know, I only joined in to stand up against commit in the 
supermodule triggering commits in the submodule.  That lead to me trying to 
get an understanding of how it would work.

As far as I can see, the only way a submodule is any use is if it is always a 
submodule-commit-hash that is noted in the supermodule tree object.  That 
means that the supermodule will only commit clean submodules.  The rest is 
just UI to show something useful in the difficult cases when the submodule 
tree is dirty.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
       [not found]                                               ` <200611301255.41733.andyparkins@gmail.com>
@ 2006-11-30 14:00                                                 ` Stephan Feder
  2006-11-30 14:49                                                   ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Stephan Feder @ 2006-11-30 14:00 UTC (permalink / raw)
  To: Andy Parkins; +Cc: Shawn Pearce, git

Andy Parkins wrote:
> On Thursday 2006 November 30 11:57, sf wrote:
> 
>> > Worse, if you allow that to happen, the supermodule can commit a state
>> > that cannot be retrieved from the submodule's repository.  The ONLY thing
>> > a supermodule can record about a submodule is a commit.
>>
>> So what? You have a submodule commit that only exists in the
>> supermodule. I fail to see the problem. The changes you made to the
>> submodule _in the supermodule_ can later be pulled from wherever you want.
> 
> Eh?  The files aren't stored in the supermodule, they're stored in the 
> submodule.  The ONLY thing in the supermodule is the commit hash.  The 
> objects for the submodule are still /in/ the submodule.

But you have got the submodule on your local disk anyway. So just setup 
alternates and the supermodule contains all of the submodule.

> It sounds like you're suggesting that the supermodule commit includes files 
> from the submodule?  How can that work?   The two aren't separate entities 
> then, it's just one big repository. 

It works as it always works in git: The supermodule commit contains the 
submodule commit, the submodule commit contains the submodule files, so 
the supermodule contains the submodule (at least the part of the 
submodule that is visible). It _must_ be one repository but it need not 
be big (once more, use alternates).

> I mean, what would this supermodule commit look like?  Would it include a 
> commit message?  Which module should that commit message be about?  Should 
> the commit's parents be stored?  Which parents, the submodule HEAD or the 
> supermodule HEAD?  Which tree object should it link to?  The one in the 
> submodule doesn't exist, so it'll have to be a freshly made up one for the 
> supermodule - except now you've put submodule paths in the supermodule.  
> Nope.  That's never going to work.

Again I do not see the problem. Probably I have a much simpler picture 
of submodules: They are just commits in the supermodule's tree. 
Everything else follows naturally from how git currently behaves.

Of course it works. It is simple, it is the git way.

Am I missing the point?

Regards

Stephan

-- 
b.i.t.
beratungsgesellschaft für informations-technologie mbh
Stephan Feder
elisabethenstr. 62   fon: +49(0)6151/827575
64283 darmstadt      fax: +49(0)6151/827576
mailto:sf@b-i-t.de   www: http://www.b-i-t.de

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 14:00                                                 ` Stephan Feder
@ 2006-11-30 14:49                                                   ` Andy Parkins
  2006-11-30 15:20                                                     ` Sven Verdoolaege
  2006-11-30 16:05                                                     ` sf
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-30 14:49 UTC (permalink / raw)
  To: git

On Thursday 2006 November 30 14:00, Stephan Feder wrote:

> Again I do not see the problem. Probably I have a much simpler picture
> of submodules: They are just commits in the supermodule's tree.
> Everything else follows naturally from how git currently behaves.

How are these commits any different from just having one big repository?  If 
some of the development of the submodule is contained in the supermodule then 
it's not a submodule anymore.

Why bother with all the effort to make a separation between submodule and 
supermodule and then store the submodule commits in the supermodule.  That's 
not supermodule/submodule git - that's just normal git.

Surely the whole point of having submodule's is so that you can take the 
submodule away.  Let me give you an example.  Let's say I have a project that 
uses the libxcb library (some random project out in the world that uses git).  
I've arranged it something like this:

myproject (git root)
 |----- src
 |----- doc
 `----- libxcb (git root)

This works fine; with one problem.  When I make a commit in myproject, there 
is no link into the particular snapshot of the libxcb that I used at that 
moment.  If libxcb moves on, and makes incompatible changes, then when I 
checkout an old version of myproject, it won't compile any more because I'll 
need to find out which commit of libxcb I used at the time.

Submodules will solve this problem.  In the future I'll be able to check out 
any commit of myproject and it will automatically checkout the right commit 
from the libxcb repository.  Now let's say I'm working away and find a bug in 
libxcb; I fix it, commit it.  That change had better be stored in the libxcb 
repository, and had better make no reference to the myproject repository.  If 
it doesn't, I'm going to have to pollute the libxcb upstream repository with 
myproject if I want to share those fixes.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 14:49                                                   ` Andy Parkins
@ 2006-11-30 15:20                                                     ` Sven Verdoolaege
  2006-11-30 15:30                                                       ` Andy Parkins
  2006-11-30 16:05                                                     ` sf
  1 sibling, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-11-30 15:20 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

On Thu, Nov 30, 2006 at 02:49:53PM +0000, Andy Parkins wrote:
> How are these commits any different from just having one big repository?  If 

You can work on the submodule independently.

> some of the development of the submodule is contained in the supermodule then 
> it's not a submodule anymore.

On the contrary, that's exactly what a submodule is supposed to be.

> Why bother with all the effort to make a separation between submodule and 
> supermodule and then store the submodule commits in the supermodule.  That's 
> not supermodule/submodule git - that's just normal git.

[..]

> myproject (git root)
>  |----- src
>  |----- doc
>  `----- libxcb (git root)
> 
[..]
> 
> Submodules will solve this problem.  In the future I'll be able to check out 
> any commit of myproject and it will automatically checkout the right commit 
> from the libxcb repository.

How are you going to checkout the right commit of the lixcb repo if
you didn't store it in the supermodule ?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 15:20                                                     ` Sven Verdoolaege
@ 2006-11-30 15:30                                                       ` Andy Parkins
  2006-11-30 15:50                                                         ` Andreas Ericsson
                                                                           ` (2 more replies)
  0 siblings, 3 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-30 15:30 UTC (permalink / raw)
  To: git

On Thursday 2006 November 30 15:20, Sven Verdoolaege wrote:

> You can work on the submodule independently.

It's not independent if any part of it is in the supermodule.

> > some of the development of the submodule is contained in the supermodule
> > then it's not a submodule anymore.
>
> On the contrary, that's exactly what a submodule is supposed to be.

I don't think so.  I think it's just made some complicated normal repository.

> How are you going to checkout the right commit of the lixcb repo if
> you didn't store it in the supermodule ?

Well, I know what the commit is /that/ was all that was stored.  So I 
(actually supermodule-git does):

cd $DIRECTORY_ASSOCIATED_WITH_SUBMODULE
git checkout -f $COMMIT_FROM_SUPERMODULE

Obviously, this is grossly simplified.  It also requires that HEAD be allowed 
to be an arbitrary commit rather than a branch, but that's already been 
generally agreed upon as a good thing.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 15:30                                                       ` Andy Parkins
@ 2006-11-30 15:50                                                         ` Andreas Ericsson
  2006-11-30 16:08                                                           ` Andy Parkins
  2006-11-30 16:33                                                         ` Sven Verdoolaege
  2006-11-30 17:19                                                         ` Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-11-30 15:50 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins wrote:
> On Thursday 2006 November 30 15:20, Sven Verdoolaege wrote:
> 
>> You can work on the submodule independently.
> 
> It's not independent if any part of it is in the supermodule.
> 
>>> some of the development of the submodule is contained in the supermodule
>>> then it's not a submodule anymore.
>> On the contrary, that's exactly what a submodule is supposed to be.
> 
> I don't think so.  I think it's just made some complicated normal repository.
> 

I believe that Andy meant "development history" in his above scentence. 
Naturally, using the code from the submodule while being capable of 
developing the submodule separately from the supermodule is what 
submodules are all about.

>> How are you going to checkout the right commit of the lixcb repo if
>> you didn't store it in the supermodule ?
> 
> Well, I know what the commit is /that/ was all that was stored.  So I 
> (actually supermodule-git does):
> 
> cd $DIRECTORY_ASSOCIATED_WITH_SUBMODULE
> git checkout -f $COMMIT_FROM_SUPERMODULE
> 
> Obviously, this is grossly simplified.  It also requires that HEAD be allowed 
> to be an arbitrary commit rather than a branch, but that's already been 
> generally agreed upon as a good thing.
> 

It has? We're not talking supermodule specific things anymore, are we?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 14:49                                                   ` Andy Parkins
  2006-11-30 15:20                                                     ` Sven Verdoolaege
@ 2006-11-30 16:05                                                     ` sf
  2006-11-30 16:12                                                       ` sf
  2006-12-01  9:19                                                       ` Andy Parkins
  1 sibling, 2 replies; 252+ messages in thread
From: sf @ 2006-11-30 16:05 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins wrote:
> On Thursday 2006 November 30 14:00, Stephan Feder wrote:
> 
>> Again I do not see the problem. Probably I have a much simpler picture
>> of submodules: They are just commits in the supermodule's tree.
>> Everything else follows naturally from how git currently behaves.
> 
> How are these commits any different from just having one big repository?  If 
> some of the development of the submodule is contained in the supermodule then 
> it's not a submodule anymore.

Right now you only have commits of the top directory aka the super 
project. Every subdirectory is just that: a directory (which git stores 
as trees).

Now, if you have a subdirectory that git stores as a commit, not a tree, 
you have a subproject. It is a directory with history, and because the 
commit is part of your superprject, you have access to this history.

> Why bother with all the effort to make a separation between submodule and 
> supermodule and then store the submodule commits in the supermodule.  That's 
> not supermodule/submodule git - that's just normal git.

No, it is not. Currently, there is no way to store a commit within the 
contents of another commit. You can only store trees and blobs.

> Surely the whole point of having submodule's is so that you can take the 
> submodule away.  Let me give you an example.  Let's say I have a project that 
> uses the libxcb library (some random project out in the world that uses git).  
> I've arranged it something like this:
> 
> myproject (git root)
>  |----- src
>  |----- doc
>  `----- libxcb (git root)
> 
> This works fine; with one problem.  When I make a commit in myproject, there 
> is no link into the particular snapshot of the libxcb that I used at that 
> moment.  If libxcb moves on, and makes incompatible changes, then when I 
> checkout an old version of myproject, it won't compile any more because I'll 
> need to find out which commit of libxcb I used at the time.

OK.

> Submodules will solve this problem.  In the future I'll be able to check out 
> any commit of myproject and it will automatically checkout the right commit 
> from the libxcb repository.

OK, I am still with you so far.

> Now let's say I'm working away and find a bug in 
> libxcb; I fix it, commit it.  That change had better be stored in the libxcb 
> repository, and had better make no reference to the myproject repository.  If 
> it doesn't, I'm going to have to pollute the libxcb upstream repository with 
> myproject if I want to share those fixes.

Here comes the part where we did not meet before.

Of course you do not make any reference from your subproject to your 
superproject. You do exactly what you do in git today when you work with 
different branches:

Step 1: You fix a bug in myproject's subdirectory libxcb.

Step 2: You commit to myproject. myproject now contains a new commit 
object in path libxcb. (How to do that is up to the UI but at the 
repository level the outcome should be obvious). This commit is local to 
your repository.

Step 3: You propose your changes to the libxcb upstream (it might not be 
a repository you have write access to). I use the following made up 
syntax (see man git-rev-parse):

A suffix : followed by a path, _followed by a suffix //::_ names the 
_revision_ at the given path in the tree-ish object named by the part 
before the colon.

Step 3a: Generate a patch

git diff libxcb//^..libxcb//

Step 3b: Push your changes

git push <libxcb-repository> HEAD:libxcb//:<branch in libxcb-repository>

Step 3c: Let your changes be pulled

"Hello, please pull <myproject-repository> HEAD:libxcb//:<branch in 
libxcb-repository>"

Step 4: Pull upstream version (hopefully with your changes, otherwise 
you have to merge)

git pull <libxcb-repository> <branch in libxcb-repository>::HEAD:libxcb//

See, it works.

 From what I understand you want to do the commit and push steps in one 
go. How do you want to record local (to your superproject) changes to 
the subproject?

Regards

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 15:50                                                         ` Andreas Ericsson
@ 2006-11-30 16:08                                                           ` Andy Parkins
  0 siblings, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-11-30 16:08 UTC (permalink / raw)
  To: git; +Cc: Andreas Ericsson

On Thursday 2006 November 30 15:50, Andreas Ericsson wrote:

> > Obviously, this is grossly simplified.  It also requires that HEAD be
> > allowed to be an arbitrary commit rather than a branch, but that's
> > already been generally agreed upon as a good thing.
>
> It has? We're not talking supermodule specific things anymore, are we?

Not entirely, although I think it's going to be handy for submodules.  It was 
in a thread about remotes branches.  By allowing checkout of any commit 
rather than only those that have a ref/heads/ entry, you effectively have a 
read-only checkout.  You obviously couldn't commit to a repository like this, 
because HEAD wouldn't point at anything that is changeable.  It would be very 
easy to just git-branch from there and start work though.

I think it's going to be necessary for the submodule work, because without it 
the supermodule will have to create it's own temporary branches in the 
submodule in order to checkout an arbitrary commit.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 16:05                                                     ` sf
@ 2006-11-30 16:12                                                       ` sf
  2006-12-01  9:19                                                       ` Andy Parkins
  1 sibling, 0 replies; 252+ messages in thread
From: sf @ 2006-11-30 16:12 UTC (permalink / raw)
  To: git

sf wrote:
...
> A suffix : followed by a path, _followed by a suffix //::_ names the 
> _revision_ at the given path in the tree-ish object named by the part 
> before the colon.

Sorry, that was supposed to read: followed by a suffix //


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 15:30                                                       ` Andy Parkins
  2006-11-30 15:50                                                         ` Andreas Ericsson
@ 2006-11-30 16:33                                                         ` Sven Verdoolaege
  2006-12-01  0:01                                                           ` Andy Parkins
  2006-11-30 17:19                                                         ` Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-11-30 16:33 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

On Thu, Nov 30, 2006 at 03:30:49PM +0000, Andy Parkins wrote:
> On Thursday 2006 November 30 15:20, Sven Verdoolaege wrote:
> > How are you going to checkout the right commit of the lixcb repo if
> > you didn't store it in the supermodule ?
> 
> Well, I know what the commit is /that/ was all that was stored.  So I 

Then I have no idea what you are talking about.
A commit _contains_ all the history that lead up to that commit,
so if you have the commit, then you also have the history.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-29 20:00                                           ` Andy Parkins
  2006-11-30 12:16                                             ` Andreas Ericsson
@ 2006-11-30 17:06                                             ` Martin Waitz
  2006-11-30 18:57                                               ` Andreas Ericsson
  2006-12-01  9:02                                               ` Andy Parkins
  1 sibling, 2 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-30 17:06 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 6067 bytes --]

hoi :)

On Wed, Nov 29, 2006 at 08:00:22PM +0000, Andy Parkins wrote:
> On Wednesday 2006, November 29 16:03, Martin Waitz wrote:
> 
> > The way I wanted to address this is to show in the supermodule
> > git-status that the submodule is using another branch.
> > That way you are warned and can decide not to commit the supermodule.
> 
> The problem I see with tracking a particular branch is that it makes it less 
> convenient to use git's quick-branching features in the submodules.  Let's 
> say I want to try something out quickly in a submodule, I make a branch, 
> commit, commit, "hmm, looks good, let's snapshot it in the supermodule", make 
> a supermodule branch, "oh no, I've got to tell the supermodule to track the 
> new (but temporary) branch in the submodule do a commit, switch the submodule 
> branch back to master, delete the temporary branch, remember that the 
> supermodule is tracking that branch and tell the supermodule to track 
> something else instead...  It all seems too complicated to me.

What about:
You decide to try something out quickly and create a new branch in the
submodule. After you have verified that it works, you merge it to the
submodules master branch and commit that to the supermodule.
Not that complicated, isn't it?
In fact, my current implementation does not even allow to change the
branch name of the submodule which is tracked by the supermodule ;-).

> > Pro HEAD:
> >  - update-index on submodule really updates the supermodule index with
> >    a commit that resembles the working directory.
> 
> Ouch.  Why does the submodule need to update the supermodule index?

Please excuse that I am not an native english speaker and I may have
caused some confusion here.

> That should be done by update-index in the supermodule.

That is exactly what I wanted to say. In the supermoduel you call
update-index (with the submodule path as argument) to update the index
of the supermodule. Just like normal files. Nothing new.

> Further, how is the supermodule index going to represent working
> directory changes in the submodule?  The only link between the two is
> a commit hash.  It has to be like that otherwise you haven't made a
> supermodule-submodule, you've just made one super-repository.  Also,
> if you don't store submodule commit hashes, then there is no way to
> guarantee that you're going to be able get back the state of the
> submodule again.

This is handled in the next paragraph.
The argument really is: HEAD always points to the checked out branch,
so it really has a relationship to the working directory.

> > Contra HEAD:
> >  - HEAD is not garanteed to be equal to the working directory anyway,
> >    you may have uncommitted changes.
> 
> That's the case for every file in a repository, so isn't really a
> worry.  It's the equivalent of changing a file and not updating the
> index - who cares?  As long as update-index tells you that the
> submodule is dirty and what to do to clean it, everything is great.

Yes, it's not a real counter-argument, but it relativates the previous
pro-argument.

> >  - when updating the supermodule, you have to take care that your
> >    submodules are on the right branch.
> >    You might for example have some testing-throwawy branch in one
> >    submodule and don't want to merge it with other changes yet.
> 
> What is the "right" branch though?  As I said above, if you're tracking one 
> branch in the submodule then you've effectively locked that submodule to that 
> branch for all supermodule uses.

yes, but luckily GIT branches are very flexible.

> Or you've made yourself a big rod to beat yourself with everytime you
> want to do some development on an "off" branch on the submodule.

I don't think it is that bad.

> > Pro refs/heads/master:
> >  - the supermodule really tracks one defined branch of development.
> 
> Why is this a pro?

You always know which branch in the submodule is the "upstream" branch
which is managed by the supermodule.
You can easily have several topic-branches and merge updates from the
master branch.
otherwise you always have to remember which branch holds your current
contents from the supermodule.

When viewed from the supermodule, you are storing one branch per
submodule in your tree.

> >  - you can easily overwrite one submodule by changing to another branch,
> >    without fearing that changes in the supermodule change anything
> >    there.
> 
> You can always do that anyway by simply not running update-index for the 
> submodule in the supermodule.

Suppose you are working on a complicated feature in one submodule.
You create your own branch for that feature and work on it.
Now you want to update your project, so you pull a new supermodule
version. Now this pull also included one (for you unimportant) change
in the submodule.
I think it is more clear to update the master branch with the new
version coming from the supermodule, while leaving your work intact
(you haven't commited it to the supermodule yet, so the supermodule
should not care about your changes, it's just some dirty tree).
Then you can freely merge between your branch and master as you like and
are not forced to merge at once. And perhaps you even do not want to
merge at all, because you are on an experimental branch which really is
mutually exclusive with the current supermodule contents.

> > Contra refs/heads/master:
> >  - after updating the supermodule, you may not have the correct working
> >    directory checked out everywhere, because some submodules may be on a
> >    different branch.
> 
> This seems like the biggest problem to me - doesn't this negate all the 
> advantages of a submodule system?  After a check in, you have no idea if what 
> you checked in was what was in your working tree.

Of course you know: git-status will tell it.
This is no different to today, where you can commit while still leaving
a part of the tree dirty.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 15:30                                                       ` Andy Parkins
  2006-11-30 15:50                                                         ` Andreas Ericsson
  2006-11-30 16:33                                                         ` Sven Verdoolaege
@ 2006-11-30 17:19                                                         ` Martin Waitz
  2 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-11-30 17:19 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1065 bytes --]

hoi :)

On Thu, Nov 30, 2006 at 03:30:49PM +0000, Andy Parkins wrote:
> Well, I know what the commit is /that/ was all that was stored.  So I 
> (actually supermodule-git does):
> 
> cd $DIRECTORY_ASSOCIATED_WITH_SUBMODULE
> git checkout -f $COMMIT_FROM_SUPERMODULE
> 
> Obviously, this is grossly simplified.  It also requires that HEAD be allowed 
> to be an arbitrary commit rather than a branch, but that's already been 
> generally agreed upon as a good thing.

It's not that easy.

You also have to make sure that all your submodule commits that _ever_
have been part of your submodule have be stay in your repository
forever.
Consider that your submodule switches to an other branch and some
old commits are not referenced by the current version any more.
These old commits still have to survive a git-prune, if they have been
part of some old supermodule version.
So you really have to connect both object databases and it's not enough
to just store the commit sha1 without actually parsing it by the GIT
core.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 17:06                                             ` Martin Waitz
@ 2006-11-30 18:57                                               ` Andreas Ericsson
  2006-12-01  8:49                                                 ` Andy Parkins
                                                                   ` (2 more replies)
  2006-12-01  9:02                                               ` Andy Parkins
  1 sibling, 3 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-11-30 18:57 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Andy Parkins, git

Martin Waitz wrote:
> hoi :)
> 
> On Wed, Nov 29, 2006 at 08:00:22PM +0000, Andy Parkins wrote:
>> On Wednesday 2006, November 29 16:03, Martin Waitz wrote:
>>
>> Further, how is the supermodule index going to represent working
>> directory changes in the submodule?  The only link between the two is
>> a commit hash.  It has to be like that otherwise you haven't made a
>> supermodule-submodule, you've just made one super-repository.  Also,
>> if you don't store submodule commit hashes, then there is no way to
>> guarantee that you're going to be able get back the state of the
>> submodule again.
> 
> This is handled in the next paragraph.
> The argument really is: HEAD always points to the checked out branch,
> so it really has a relationship to the working directory.
> 
>>> Contra HEAD:
>>>  - HEAD is not garanteed to be equal to the working directory anyway,
>>>    you may have uncommitted changes.
>> That's the case for every file in a repository, so isn't really a
>> worry.  It's the equivalent of changing a file and not updating the
>> index - who cares?  As long as update-index tells you that the
>> submodule is dirty and what to do to clean it, everything is great.
> 
> Yes, it's not a real counter-argument, but it relativates the previous
> pro-argument.
> 
>>>  - when updating the supermodule, you have to take care that your
>>>    submodules are on the right branch.
>>>    You might for example have some testing-throwawy branch in one
>>>    submodule and don't want to merge it with other changes yet.
>> What is the "right" branch though?  As I said above, if you're tracking one 
>> branch in the submodule then you've effectively locked that submodule to that 
>> branch for all supermodule uses.
> 
> yes, but luckily GIT branches are very flexible.
> 

There's no real technical reason for locking it to a single branch 
though, and in case of a fork in the upstream submodule project, you 
might suddenly decide that "the other team" is heading in a much more 
interesting direction and you want to use their work in your module 
instead. Will you now have to maintain a separate branch just to keep 
the same name as the branch the original team used?

>> Or you've made yourself a big rod to beat yourself with everytime you
>> want to do some development on an "off" branch on the submodule.
> 
> I don't think it is that bad.
> 

It could be, and as has already been stated, there's no real reason to 
limit this to a particular branch, so I don't see why we would want to 
impose such non-real restrictions.

>>> Pro refs/heads/master:
>>>  - the supermodule really tracks one defined branch of development.
>> Why is this a pro?
> 
> You always know which branch in the submodule is the "upstream" branch
> which is managed by the supermodule.

No you don't. The branch-name might be moved to some other tip of the 
DAG, and that's exactly the same as changing the branch you're tracking.

> You can easily have several topic-branches and merge updates from the
> master branch.
> otherwise you always have to remember which branch holds your current
> contents from the supermodule.
> 

No you don't. The only thing you need is the commit-sha.

> When viewed from the supermodule, you are storing one branch per
> submodule in your tree.
> 

Wrong again. You're storing one particular point in the revision history.

>>>  - you can easily overwrite one submodule by changing to another branch,
>>>    without fearing that changes in the supermodule change anything
>>>    there.
>> You can always do that anyway by simply not running update-index for the 
>> submodule in the supermodule.
> 
> Suppose you are working on a complicated feature in one submodule.
> You create your own branch for that feature and work on it.
> Now you want to update your project, so you pull a new supermodule
> version. Now this pull also included one (for you unimportant) change
> in the submodule.

git reset to the rescue.

> I think it is more clear to update the master branch with the new
> version coming from the supermodule, while leaving your work intact
> (you haven't commited it to the supermodule yet, so the supermodule
> should not care about your changes, it's just some dirty tree).
> Then you can freely merge between your branch and master as you like and
> are not forced to merge at once. And perhaps you even do not want to
> merge at all, because you are on an experimental branch which really is
> mutually exclusive with the current supermodule contents.
> 

This is all just policy though. Tools that enforce a certain policy are 
not good tools.

The only problem I'm seeing atm is that the supermodule somehow has to 
mark whatever commits it's using from the submodule inside the submodule 
repo so that they effectively become un-prunable, otherwise the 
supermodule may some day find itself with a history that it can't restore.

The really major problem with this is that now you'll have one 
repository of the submodule that is actually special, so it's not 
certain you can go and use any repository at all of the submodule code, 
since the upstream repo most likely won't be all that interested in 
having all of that meta-data inside it. In reality, I'm sure this will 
be a small problem though, as submodules that are in reality projects 
which the supermodule's maintainer isn't the owner of will most likely 
never rewind their history beyond the supermodules stored commit. It's 
something fsck will have to be taught to watch for though.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 16:33                                                         ` Sven Verdoolaege
@ 2006-12-01  0:01                                                           ` Andy Parkins
  2006-12-01  0:11                                                             ` Jakub Narebski
  2006-12-01  9:32                                                             ` Sven Verdoolaege
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-01  0:01 UTC (permalink / raw)
  To: git

On Thursday 2006, November 30 16:33, Sven Verdoolaege wrote:
> > Well, I know what the commit is /that/ was all that was stored.  So I
>
> Then I have no idea what you are talking about.
> A commit _contains_ all the history that lead up to that commit,
> so if you have the commit, then you also have the history.

It's not so much an actual commit, as a reference to a commit in another 
repository.

Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  0:01                                                           ` Andy Parkins
@ 2006-12-01  0:11                                                             ` Jakub Narebski
  2006-12-01  9:32                                                             ` Sven Verdoolaege
  1 sibling, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-01  0:11 UTC (permalink / raw)
  To: git

Andy Parkins wrote:

> On Thursday 2006, November 30 16:33, Sven Verdoolaege wrote:
>>>
>>> Well, I know what the commit is /that/ was all that was stored.  So I
>>
>> Then I have no idea what you are talking about.
>> A commit _contains_ all the history that lead up to that commit,
>> so if you have the commit, then you also have the history.
> 
> It's not so much an actual commit, as a reference to a commit in another 
> repository.

Hmmm... I thought the idea was that submodule commit is available in the
object repository, be it via alternates mechanism pointing to the submodule
repository for alternate storage, or submodule being in "unrelated"
branch/tracking branch.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 18:57                                               ` Andreas Ericsson
@ 2006-12-01  8:49                                                 ` Andy Parkins
  2006-12-01  9:33                                                   ` Andreas Ericsson
  2006-12-01 12:03                                                 ` sf
  2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
  2 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01  8:49 UTC (permalink / raw)
  To: git

On Thursday 2006 November 30 18:57, Andreas Ericsson wrote:

(agree with everything in your mail)

> The only problem I'm seeing atm is that the supermodule somehow has to
> mark whatever commits it's using from the submodule inside the submodule
> repo so that they effectively become un-prunable, otherwise the
> supermodule may some day find itself with a history that it can't restore.

What about submodule/.git/refs/supermodule/commit12345678, where "12345678" is 
the hash of the supermodule commit?  This gives a convenient route in the 
submodule to which commit contains that commit from the submodule; but 
doesn't write anything into the submodule repository itself.  It's just a tag 
with a different intent.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 17:06                                             ` Martin Waitz
  2006-11-30 18:57                                               ` Andreas Ericsson
@ 2006-12-01  9:02                                               ` Andy Parkins
  2006-12-01 11:00                                                 ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01  9:02 UTC (permalink / raw)
  To: git

On Thursday 2006 November 30 17:06, Martin Waitz wrote:

> You can easily have several topic-branches and merge updates from the
> master branch.
> otherwise you always have to remember which branch holds your current
> contents from the supermodule.

WHAT?  I've got to make merges (that I don't necessarily want) in order to 
commit in the supermodule?  This completely negates any useful functioning of 
branches in the submodule.  I want to be able to make a quick development 
branch in the submodule and NOT merge that code into master and then be able 
to still commit that in the supermodule.

I think you're imagining the binding between the super and sub is very much 
tighter than it should be.  What if I'm working on a development version of 
the supermodule, which includes a stable version of the submodule?  Vice 
versa?

> When viewed from the supermodule, you are storing one branch per
> submodule in your tree.

That prevents me "trying something out" on a topic branch in the submodule.  
Here's a scenario using my suggested "supermodule tracks submodule HEAD" 
method.

 * You're developerA
 * Make a development branch in the supermodule
 * In the submodule, make a whole load of topic branches
 * Make a development branch in the submodule
 * Merge the topic branches into the development branch of the submodule
 * Commit in the supermodule.  This capture
 * Tag that commit "my-tested-arrangement-of-submodule-features"
 * Push that tag to the central repository - tell the world.
 * DeveloperB checks out that tag and tries it.  Great stuff.

Now: here's the secret fact that I didn't tell you that will break 
your "supermodule tracks submodule branch" method.  DeveloperB has decided to 
have this in his remote:
  Pull: refs/heads/master:refs/heads/upstream/master
Oops. The supermodule, which has been told to track the "master" branch in the 
submodule is tracking different things in developerA's repository from 
developerB's repository.  Worse, what if developerB did this:
  Pull: refs/heads/master:refs/heads/development
  Pull: refs/heads/development:refs/heads/master

Branches are completely arbitrary per-repository.  You cannot rely on them 
being consistent between different repositories.  If you store the name of a 
submodule branch in a supermodule - that supermodule is only valid for that 
one special case of your particular version of the submodule.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 16:05                                                     ` sf
  2006-11-30 16:12                                                       ` sf
@ 2006-12-01  9:19                                                       ` Andy Parkins
  2006-12-01  9:57                                                         ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01  9:19 UTC (permalink / raw)
  To: git

On Thursday 2006 November 30 16:05, sf wrote:

> Step 2: You commit to myproject. myproject now contains a new commit
> object in path libxcb. (How to do that is up to the UI but at the
> repository level the outcome should be obvious). This commit is local to
> your repository.

Let's imagine a supermodule repository, and guess at it in more detail (I'll 
abbreviate some of the less interesting output):

$ git-cat-file -p HEAD
tree fb02e78085ecf2f29045603df858b5362e5bf8a4
parent 4f2dba685507e4a8e07dac298c4024feaec6bd7d
author Andy Parkins
committer Andy Parkins 
$ git-cat-file -p fb02e78085ecf2f29045603df858b5362e5bf8a4
100644 blob 46bd4e284a57e2faa539e7b72d62a38867075af5    Makefile
040000 tree 49ea01373a986a3db44d66702714aa75059ffa2c    doc
040000 subm d0a877464dc0198667a3e27ed3af8448ddacf947    libxcb

The "subm" type is our new ODB object that's going to store whatever we will 
need to access the submodule.  "libxcb" has already told us where this 
submodule is in the supermodule tree.

$ git-cat-file -p d0a877464dc0198667a3e27ed3af8448ddacf947
submodulecommithash ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7
submoduleurlhint git://anongit.freedesktop.org/git/xcb/libxcb

Here "submodulecommithash" is telling us what commit in the submodule is 
stored in this supermodule tree.  The "submoduleurlhint" is to help when 
git-clone is used to clone this supermodule.

They key thing I wanted to point out here is the line:
  submodulecommithash ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7
This is the ONLY link you have to the submodule.  I think this line represents 
the fundamental difference between our thinking on submodules.

I say:
 submodulecommithash points at a commit /in the submodule/
You say:
 "This commit is local to your repository".  i.e. it points at a commit in
 the supermodule, which in turn implies that the local commit object points
 at a local tree and local parents.

My question is therefore: tell me what that local commit's tree and parent's 
are?  At the moment I am having difficulty understanding what meaningful 
things you could have in those fields.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  0:01                                                           ` Andy Parkins
  2006-12-01  0:11                                                             ` Jakub Narebski
@ 2006-12-01  9:32                                                             ` Sven Verdoolaege
  2006-12-01 10:19                                                               ` Andy Parkins
  1 sibling, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-12-01  9:32 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

On Fri, Dec 01, 2006 at 12:01:54AM +0000, Andy Parkins wrote:
> On Thursday 2006, November 30 16:33, Sven Verdoolaege wrote:
> > > Well, I know what the commit is /that/ was all that was stored.  So I
> >
> > Then I have no idea what you are talking about.
> > A commit _contains_ all the history that lead up to that commit,
> > so if you have the commit, then you also have the history.
> 
> It's not so much an actual commit, as a reference to a commit in another 
> repository.

This is heresy.  Any object referenced in a tree should be in the repo
(possibly via alternates).


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  8:49                                                 ` Andy Parkins
@ 2006-12-01  9:33                                                   ` Andreas Ericsson
  2006-12-01 10:38                                                     ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-01  9:33 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins wrote:
> On Thursday 2006 November 30 18:57, Andreas Ericsson wrote:
> 
> (agree with everything in your mail)
> 
>> The only problem I'm seeing atm is that the supermodule somehow has to
>> mark whatever commits it's using from the submodule inside the submodule
>> repo so that they effectively become un-prunable, otherwise the
>> supermodule may some day find itself with a history that it can't restore.
> 
> What about submodule/.git/refs/supermodule/commit12345678, where "12345678" is 
> the hash of the supermodule commit?  This gives a convenient route in the 
> submodule to which commit contains that commit from the submodule; but 
> doesn't write anything into the submodule repository itself.  It's just a tag 
> with a different intent.
> 

True, but this makes one repo of the submodule special. Let's say you 
have this layout

mozilla/.git
mozilla/openssl/.git
mozilla/xlat/.git

Now, we can be reasonably sure that the 'xlat' repo is something the 
mozilla core team can push to, or at least we can consider the core repo 
owners an official "vendor" of tags for the submodule repo. I'm fairly 
certain openssl authors won't be too happy with allowing the thousands 
of projects using its code to push tags to its official repo though.

Now that I think about it more, I realize this is completely irrelevant 
as the ui can create the tags in the submodule with info only from the 
the supermodule, which means the submodule repo will only be special if 
it's connected to the supermodule. We just need a command for creating 
those tags in the submodule repo so people who use the same submodule 
code for several projects can use the alternates mechanism effectively.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  9:19                                                       ` Andy Parkins
@ 2006-12-01  9:57                                                         ` Martin Waitz
  2006-12-01 10:29                                                           ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01  9:57 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1586 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 09:19:04AM +0000, Andy Parkins wrote:
> Let's imagine a supermodule repository, and guess at it in more detail (I'll 
> abbreviate some of the less interesting output):
> 
> $ git-cat-file -p HEAD
> tree fb02e78085ecf2f29045603df858b5362e5bf8a4
> parent 4f2dba685507e4a8e07dac298c4024feaec6bd7d
> author Andy Parkins
> committer Andy Parkins 
> $ git-cat-file -p fb02e78085ecf2f29045603df858b5362e5bf8a4
> 100644 blob 46bd4e284a57e2faa539e7b72d62a38867075af5    Makefile
> 040000 tree 49ea01373a986a3db44d66702714aa75059ffa2c    doc
> 040000 subm d0a877464dc0198667a3e27ed3af8448ddacf947    libxcb

at the moment, it is:
  140000 commit ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7  libxcb

> The "subm" type is our new ODB object that's going to store whatever we will 
> need to access the submodule.  "libxcb" has already told us where this 
> submodule is in the supermodule tree.
> 
> $ git-cat-file -p d0a877464dc0198667a3e27ed3af8448ddacf947
> submodulecommithash ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7
> submoduleurlhint git://anongit.freedesktop.org/git/xcb/libxcb

So why do you need the url hint committed to the supermodule?
We don't store remote information in the object database, too.
Remember: this is still a distributed project, there is no one URL to
any submodule.

> I say:
>  submodulecommithash points at a commit /in the submodule/

But unluckily, this does not work.
You really have to be able to traverse the entire commit chain
from the supermodule into all submodules.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  9:32                                                             ` Sven Verdoolaege
@ 2006-12-01 10:19                                                               ` Andy Parkins
  0 siblings, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 10:19 UTC (permalink / raw)
  To: git, skimo

On Friday 2006 December 01 09:32, Sven Verdoolaege wrote:

> This is heresy.  Any object referenced in a tree should be in the repo
> (possibly via alternates).

The "submodule" object would be in the local repository.  That would refer to 
another object, and is merely part of the submodule object.  Just as  
the "Author" and "Commiter" fields are part of the commit object but aren't 
actual objects in the tree.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  9:57                                                         ` Martin Waitz
@ 2006-12-01 10:29                                                           ` Andy Parkins
  2006-12-01 10:42                                                             ` Sven Verdoolaege
  2006-12-01 11:31                                                             ` Martin Waitz
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 10:29 UTC (permalink / raw)
  To: git; +Cc: Martin Waitz

On Friday 2006 December 01 09:57, Martin Waitz wrote:

> So why do you need the url hint committed to the supermodule?
> We don't store remote information in the object database, too.

That's why it was a hint, probably configured when you first create the 
submodule connection.

> Remember: this is still a distributed project, there is no one URL to
> any submodule.

That point applies equally to your "tracking a submodule branch" point, except 
mine is only a URL hint, to help when first cloning that supermodule.  In 
truth, the clone will be perfectly able to get the submodule objects from the 
upstream supermodule, maintaining the distributed nature easily.

> > I say:
> >  submodulecommithash points at a commit /in the submodule/
>
> But unluckily, this does not work.

Eh?  "Not work", we're talking about code that doesn't even exist, of course 
it doesn't "work".   Do you mean "doesn't work if we're using my 
implementation of submodules"?  Well that hardly seems like a fair attack.

> You really have to be able to traverse the entire commit chain
> from the supermodule into all submodules.

You can: when you hit a submodule tree object you set GIT_DIR to that 
submodule and continue.  If you don't do it like that then you have stored 
submodule trees in the supermodule and it's no longer a separate repository.  
Why you'd want to - I have no idea.  What purpose would you have for 
traversing the commit chain into the submodules?  The commit in the submodule 
is just a note of where that submodule was during the supermodule commit in 
question.

I notice though that you avoided my question: what does YOUR submodule object 
contain?  I really do want to know, as there is obviously a fundamental 
difference in what I think a submodule does and what you (and maybe everybody 
else) thinks a submodule does.  I'm perfectly willing to accept I'm wrong, 
but not without understanding how your method is going to work.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  9:33                                                   ` Andreas Ericsson
@ 2006-12-01 10:38                                                     ` Andy Parkins
  0 siblings, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 10:38 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 09:33, Andreas Ericsson wrote:

> True, but this makes one repo of the submodule special. Let's say you
> have this layout

In a way, but it's information that doesn't need to be transmitted.

> mozilla/.git
> mozilla/openssl/.git
> mozilla/xlat/.git
>
> Now, we can be reasonably sure that the 'xlat' repo is something the
> mozilla core team can push to, or at least we can consider the core repo
> owners an official "vendor" of tags for the submodule repo. I'm fairly
> certain openssl authors won't be too happy with allowing the thousands
> of projects using its code to push tags to its official repo though.

No need, when cloning a supermodule, it will make those special tags 
automatically in the submodule repo.  They are only there to prevent prune 
from destroying those referenced commits after all.  If the submodule is 
cloned directly, they aren't needed anyway, and those objects won't be part 
of the dependency chain so wouldn't be downloaded.

> Now that I think about it more, I realize this is completely irrelevant
> as the ui can create the tags in the submodule with info only from the
> the supermodule, which means the submodule repo will only be special if
> it's connected to the supermodule. We just need a command for creating
> those tags in the submodule repo so people who use the same submodule
> code for several projects can use the alternates mechanism effectively.

Is that even necessary?  git-clone of a supermodule will make those tags 
automatically.  If a submodule was alternative-cloned into a different 
supermodule, well then THAT supermodule would make the right tags for itself.  
Ah, I think I see what you mean now though, a method would be needed for 
creating those tags if we managed to manually get a submodule repository in 
to the supermodule - then supermodule-clone wouldn't have run.  Perhaps they 
could be checked for at commit time and recreated then?

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 10:29                                                           ` Andy Parkins
@ 2006-12-01 10:42                                                             ` Sven Verdoolaege
  2006-12-01 11:02                                                               ` Andy Parkins
  2006-12-01 11:31                                                             ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-12-01 10:42 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Martin Waitz

On Fri, Dec 01, 2006 at 10:29:26AM +0000, Andy Parkins wrote:
> I notice though that you avoided my question: what does YOUR submodule object 
> contain?

He showed it to you in the example.  The "submodule object" is the COMMIT
of the submodule itself.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01  9:02                                               ` Andy Parkins
@ 2006-12-01 11:00                                                 ` Martin Waitz
  2006-12-01 12:09                                                   ` sf
  2006-12-02 12:48                                                   ` Jakub Narebski
  0 siblings, 2 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 11:00 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 6668 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 09:02:48AM +0000, Andy Parkins wrote:
> On Thursday 2006 November 30 17:06, Martin Waitz wrote:
> 
> > You can easily have several topic-branches and merge updates from the
> > master branch.
> > otherwise you always have to remember which branch holds your current
> > contents from the supermodule.
> 
> WHAT?  I've got to make merges (that I don't necessarily want) in
> order to commit in the supermodule?  This completely negates any
> useful functioning of branches in the submodule.  I want to be able to
> make a quick development branch in the submodule and NOT merge that
> code into master and then be able to still commit that in the
> supermodule.

exactly!

Please think about it.

If you track HEAD, then this means that you track HEAD.
In _both_ directions!

So you not only store your submodule HEAD commit in the supermodule when you
do commit to the supermodule, it also means that your submodule HEAD
will be updated when you update your supermodule.
And what happens if you already commited something to HEAD in the mean
time? Exactly: a merge is needed.

And you are right: you might not want to do this now, because you
branched off, because you _wanted_ to have some development which is
_independent_ to the current supermodule work.

So tracking HEAD really makes branching in the submodule hard to work
with.

What does the supermodule provide to the submodule? It stores one
reference to a commit sha1. Just like a reference inside refs/heads
inside the submodule. There really is not much difference between the
sha1 stored inside the supermodules tree and one stored inside refs/.
So from the submodules point of view, the supermodule is not much more
then one special branch.
But it is not possible to use the supermodule index directly as one
"magic" branch for several reasons.
So we need synchronization methods between the index entry for the
submodule which is stored in the supermodule and the references in the
submodule. These are git-update-index/git-commit and git-checkout, both
called explicitly or implicitly in the supermodule.
And I really think it makes sense to have a one-to-one relationship
between the submodule "branch" stored in the supermodule and the
branchname used in the submodule.

> I think you're imagining the binding between the super and sub is very much 
> tighter than it should be.  What if I'm working on a development version of 
> the supermodule, which includes a stable version of the submodule?  Vice 
> versa?

I don't see your problem here.

> > When viewed from the supermodule, you are storing one branch per
> > submodule in your tree.
> 
> That prevents me "trying something out" on a topic branch in the submodule.  
> Here's a scenario using my suggested "supermodule tracks submodule HEAD" 
> method.
> 
>  * You're developerA
>  * Make a development branch in the supermodule
>  * In the submodule, make a whole load of topic branches
>  * Make a development branch in the submodule
>  * Merge the topic branches into the development branch of the submodule
>  * Commit in the supermodule.  This capture
>  * Tag that commit "my-tested-arrangement-of-submodule-features"
>  * Push that tag to the central repository - tell the world.
>  * DeveloperB checks out that tag and tries it.  Great stuff.

This is still supposed to be a distributed system.
DeveloperB does not only check out the whole project including several
modules. He is also supposed to _work_ with it.

What if DeveloperB also has several topic branches?
When he checks out the new supermodule, only his current HEAD in the
submodule will be updated.
So he first has to change to some supermodule-tracking branch inside the
submodule, then pull the supermodule updates, then eventually merge the
new contents of his supermodule-tracking branch into his topic branches.
So why not make this "let's update one supermodule-tracking-branch"
automatic?

> Now: here's the secret fact that I didn't tell you that will break
> your "supermodule tracks submodule branch" method.  DeveloperB has
> decided to have this in his remote:
>   Pull: refs/heads/master:refs/heads/upstream/master
> Oops. The supermodule, which has been told to track the "master"
> branch in the submodule is tracking different things in developerA's
> repository from developerB's repository.

So what? He can do to the repository whatever he wants?
He wants to change one submodule to a different branch?
He can do so!
But please do not expect the system to magically be able to resolve
problems. If you _by intent_ changed the submodule to another branch
which is incompatible to the one used in the submodule you can't expect
that this is magically merged.
This is the same as with normal files.
Sure you can replace one file with new contents that are different to
the one used by someone else.  Don't expect this can be merged
automatically. So now you have two forks/branches of the project.
So what?

Same for a system including submodules:
If you change one submodule to a totally different branch, then you
effectivley forked/branched the entire project.
(Nomenclature is a bit difficult here: what I mean by totally different
branch is: the submodule commit tracked by the supermodule is not
directly connected to the one tracked by an old version of the
supermodule).

So whenever you introduce conflicting changes somewhere in the project
(be it in a submodule or in a file) you _always_ fork/branch the entire
project (i.e. the topmost supermodule).
You can't circumvent that.

So what are submodule branches good for then?
To store other lines of development which are not yet / not any more
tracked by the supermodule.  Perhaps you store references to branches
stored in another supermodule, or another standalone repository.
Or a temporary branch which is only used for testing.
There are really many possiblilities.
But they all have one thing in common: they are not meant to be tracked
by the supermodule.

> Branches are completely arbitrary per-repository.

Yes, but a submodule is special here: it really has one special branch.
The module is not independent any more.  That is the _nature_ of a
submodule.

> You cannot rely on them being consistent between different
> repositories.

Sure, we are in a distributed system.
But the supermodule always has to know which branch in the submodule has
to be tracked.  The easiest thing is to always use the default
refs/heads/master.  Surely this could be changed if there is a need.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 10:42                                                             ` Sven Verdoolaege
@ 2006-12-01 11:02                                                               ` Andy Parkins
  2006-12-01 11:10                                                                 ` Sven Verdoolaege
  2006-12-01 11:46                                                                 ` Martin Waitz
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 11:02 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 10:42, Sven Verdoolaege wrote:

> He showed it to you in the example.  The "submodule object" is the COMMIT
> of the submodule itself.

That's no different from mine.  I need more detail than that.

Is that commit in the submodule or the supermodule?  If it's in the submodule 
then we're talking about the same thing, as that's all I want.  If it's in 
the supermodule then I want to know what the tree object that that commit 
points to contains.  I also want to know how we tell the difference between a 
commit-in-supermodule and a 
commit-in-supermodule-which-is-actually-in-submodule.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:02                                                               ` Andy Parkins
@ 2006-12-01 11:10                                                                 ` Sven Verdoolaege
  2006-12-01 11:45                                                                   ` sf
  2006-12-01 12:12                                                                   ` Andy Parkins
  2006-12-01 11:46                                                                 ` Martin Waitz
  1 sibling, 2 replies; 252+ messages in thread
From: Sven Verdoolaege @ 2006-12-01 11:10 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

On Fri, Dec 01, 2006 at 11:02:15AM +0000, Andy Parkins wrote:
> On Friday 2006 December 01 10:42, Sven Verdoolaege wrote:
> 
> > He showed it to you in the example.  The "submodule object" is the COMMIT
> > of the submodule itself.
> 
> That's no different from mine.  I need more detail than that.

You were proposing to create an extra object containing some random value
that is disconnected from the repo.

> Is that commit in the submodule or the supermodule?

It's in BOTH.  That's why it's a *sub*module.

Someone else can try to expain it you.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 10:29                                                           ` Andy Parkins
  2006-12-01 10:42                                                             ` Sven Verdoolaege
@ 2006-12-01 11:31                                                             ` Martin Waitz
  2006-12-01 12:20                                                               ` Andy Parkins
  1 sibling, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 11:31 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3963 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 10:29:26AM +0000, Andy Parkins wrote:
> On Friday 2006 December 01 09:57, Martin Waitz wrote:
> 
> > So why do you need the url hint committed to the supermodule?
> > We don't store remote information in the object database, too.
> 
> That's why it was a hint, probably configured when you first create the 
> submodule connection.
> 
> In truth, the clone will be perfectly able to get the submodule
> objects from the upstream supermodule, maintaining the distributed
> nature easily.

that's exactly the reason why the hint is not needed.
Althogh you need to have one common project object database, storing the
objects of all modules.

> > > I say:
> > >  submodulecommithash points at a commit /in the submodule/
> >
> > But unluckily, this does not work.
> 
> Eh?  "Not work", we're talking about code that doesn't even exist, of
> course it doesn't "work".   Do you mean "doesn't work if we're using
> my implementation of submodules"?  Well that hardly seems like a fair
> attack.

Well, at first I started exactly as you described: only store the
submodule commit sha1 in the parent somewhere, but don't traverse it.
So this is a fair attack: your implementation already exists in
http://git.admingilde.org/tali/git.git/module ;-)
(ok, yes, it really is different to what you described as I stored the
sha1 differently, but I really learned that it is important to be able
to traverse the entire commit chain, from the root of the project to the
deepest submodule.)

> > You really have to be able to traverse the entire commit chain
> > from the supermodule into all submodules.
> 
> You can: when you hit a submodule tree object you set GIT_DIR to that
> submodule and continue.  If you don't do it like that then you have
> stored submodule trees in the supermodule and it's no longer a
> separate repository.

Well, a submodule repository _is_ special in some ways:
fsck and prune have to take the references from the supermodule into
account.  In this sense it is _not_ separate from the supermodule.

I think that is important for the submodule repository to be independent
in other ways than its object database:  you should be able to exchange
commits with other repositories (be they stand-alone or a submodule in
another supermodule).  You should be able to use log/diff/blame/whatever
inside the submodule.

All this does not need an object database of its own.
So I chose to do it the easy way and use one object database for the
entire project - and disallow git-prune in a submodule.
There may be other/better ways to do this, but you have to be able
to access all objects which belong the project inside the toplevel
project repository.

> Why you'd want to - I have no idea.  What
> purpose would you have for traversing the commit chain into the
> submodules?  The commit in the submodule is just a note of where that
> submodule was during the supermodule commit in question.

Things get much simpler if you have one big graph of objects.

clone and especially fetch/pull naturally work at once.
You can ask for all objects inside the whole project which are needed to
be transferred between project version A and B, including all submodules.

You can even have one bare repository for the whole project.

> I notice though that you avoided my question: what does YOUR submodule
> object contain?  I really do want to know, as there is obviously a
> fundamental difference in what I think a submodule does and what you
> (and maybe everybody else) thinks a submodule does.

It really only stores the commit of the submodule directly.
So there is no new submodule object type.  The parent has a direct link
to the submodule commit in his tree object and in its index.  In order
to separate them from normal files or normal subdirectories, they get a
special mode: they are represented as socket.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:10                                                                 ` Sven Verdoolaege
@ 2006-12-01 11:45                                                                   ` sf
  2006-12-01 12:12                                                                   ` Andy Parkins
  1 sibling, 0 replies; 252+ messages in thread
From: sf @ 2006-12-01 11:45 UTC (permalink / raw)
  To: git

Sven Verdoolaege wrote:
> On Fri, Dec 01, 2006 at 11:02:15AM +0000, Andy Parkins wrote:
>> On Friday 2006 December 01 10:42, Sven Verdoolaege wrote:
>> 
>> > He showed it to you in the example.  The "submodule object" is the COMMIT
>> > of the submodule itself.
>> 
>> That's no different from mine.  I need more detail than that.
> 
> You were proposing to create an extra object containing some random value
> that is disconnected from the repo.
> 
>> Is that commit in the submodule or the supermodule?
> 
> It's in BOTH.  That's why it's a *sub*module.

I would say it is only in the supermodule because that is the branch you 
are working on. If you are working on the submodule in an independent 
branch then you can pull from the submodule commit. But you do not want 
to pull the supermodule commit itself but only the commit in path libxcb 
(see my proposed syntax).

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:02                                                               ` Andy Parkins
  2006-12-01 11:10                                                                 ` Sven Verdoolaege
@ 2006-12-01 11:46                                                                 ` Martin Waitz
  2006-12-01 12:16                                                                   ` Andy Parkins
  1 sibling, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 11:46 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 11:02:15AM +0000, Andy Parkins wrote:
> On Friday 2006 December 01 10:42, Sven Verdoolaege wrote:
> 
> > He showed it to you in the example.  The "submodule object" is the COMMIT
> > of the submodule itself.
> 
> That's no different from mine.

Well, there simply is no proxy object inbetween.

> Is that commit in the submodule or the supermodule?

Well, logically that commit belongs to the submodule and is referenced
by the tree in the supermodule.
Phyisically it is stored in the projects object database which is
shared between the supermodule and all submodules (at least in my
implementation).

> I also want to know how we tell the difference between a
> commit-in-supermodule and a
> commit-in-supermodule-which-is-actually-in-submodule.

There is no difference.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 18:57                                               ` Andreas Ericsson
  2006-12-01  8:49                                                 ` Andy Parkins
@ 2006-12-01 12:03                                                 ` sf
  2006-12-01 12:11                                                   ` Martin Waitz
  2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
  2 siblings, 1 reply; 252+ messages in thread
From: sf @ 2006-12-01 12:03 UTC (permalink / raw)
  To: git

Andreas Ericsson wrote:
...
> The only problem I'm seeing atm is that the supermodule somehow has to 
> mark whatever commits it's using from the submodule inside the submodule 
> repo so that they effectively become un-prunable, otherwise the 
> supermodule may some day find itself with a history that it can't restore.

That has nothing to do with submodules. What you state here is the 
problem of alternate repositories.

There are two solutions:

1. Do not use alternates.

2. Do not prune a repository that is used as an alternate repository by 
other repositories.

For the submodule discussion that would mean:

1. Only fetch and work on branches of submodules you are interested in. 
It does not matter that the origin repository contains (probably orders 
of magnitude) more data. You will never touch that.

2. You can never prune the main (the supermodule's) repository, at least 
not with what git provides today.

That is why the sanest approach to subprojects is to put commits into 
tree objects, define a way to name these commits and make git understand 
these new commit names. Done. Works.

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:00                                                 ` Martin Waitz
@ 2006-12-01 12:09                                                   ` sf
  2006-12-01 12:12                                                     ` Martin Waitz
  2006-12-02 12:48                                                   ` Jakub Narebski
  1 sibling, 1 reply; 252+ messages in thread
From: sf @ 2006-12-01 12:09 UTC (permalink / raw)
  To: git

Martin Waitz wrote:
...
> So you not only store your submodule HEAD commit in the supermodule when you
> do commit to the supermodule, it also means that your submodule HEAD
> will be updated when you update your supermodule.

Why the magic? The typical workflow in git is

1. You work on a branch, i.e. edit and commit and so on.
2. At some point, you decide to share the work you did on that branch 
(e-mail a patch, merge into another branch, push upstream or let it by 
pulled by upstream)

I fail to understand why these two steps have to be mixed up. Someone 
care to explain?

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:03                                                 ` sf
@ 2006-12-01 12:11                                                   ` Martin Waitz
  2006-12-01 13:21                                                     ` sf
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 12:11 UTC (permalink / raw)
  To: sf; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 01:03:48PM +0100, sf wrote:
> Andreas Ericsson wrote:
> 2. You can never prune the main (the supermodule's) repository, at least 
> not with what git provides today.

It even already works (well, not with what git provides today, but with
my implementation). git-prune simply walks all the submodules, too, when
doing it's reachability analysis.

What does not work is a prune inside the submodule, because it does not
know about all the commits used by the supermodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:10                                                                 ` Sven Verdoolaege
  2006-12-01 11:45                                                                   ` sf
@ 2006-12-01 12:12                                                                   ` Andy Parkins
  2006-12-01 12:28                                                                     ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 12:12 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 11:10, Sven Verdoolaege wrote:

> You were proposing to create an extra object containing some random value
> that is disconnected from the repo.

Right, I think I've finally understood what Martin (and you) are proposing.  
You want every commit in the submodule to be propagated up to the supermodule 
as well.  Okay.

I don't think it's right, but at least I understand.

It seems wrong because it's making commits in the supermodule that aren't 
commits to do with that project.  In my libxcb example; why should every 
project use libxcb in have to store the entire history of libxcb?  When 
examining the supermodule history, I won't care about how libxcb got to the 
state its in, and it's just noise in the supermodule history.  What if I use 
10 submodules, the supermodule history won't show you anything useful - it's 
just unrelated submodule commits.

It gets worse, this is why I was asking for more detail: this commit that 
you're storing in the supermodule.  It's the same commit as is in the 
submodule?  What would the parent commit of that commit be?  It has to be the 
same in both, because the commit-hash forces it to be.

The only possibility would be that it's NOT the same hash in both, because the 
parents in the supermodule are inapplicable to the submodule, and the parent 
in the submodule is independent from the supermodule.  That means you have to 
store two commits: one for the submodule commit and one for the supermodule 
commit.  So what are you going to write in the supermodule commit?  Answer: a 
submodule commit hash - exactly as I said.

> > Is that commit in the submodule or the supermodule?
>
> It's in BOTH.  That's why it's a *sub*module.

If it's in BOTH then the supermodule is a normal git repository.  You aren't 
tracking the submodule, you're just including it en masse.  Using semantics 
to justify a position isn't a very strong argument, calling it a "sub" module 
is just an easy bit of naming for us to hang the discussion on, it isn't 
necessarily a mathematical subset and superset.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:09                                                   ` sf
@ 2006-12-01 12:12                                                     ` Martin Waitz
  2006-12-01 13:05                                                       ` sf
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 12:12 UTC (permalink / raw)
  To: sf; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 01:09:49PM +0100, sf wrote:
> Martin Waitz wrote:
> ...
> >So you not only store your submodule HEAD commit in the supermodule when 
> >you
> >do commit to the supermodule, it also means that your submodule HEAD
> >will be updated when you update your supermodule.
> 
> Why the magic? The typical workflow in git is
> 
> 1. You work on a branch, i.e. edit and commit and so on.
> 2. At some point, you decide to share the work you did on that branch 
> (e-mail a patch, merge into another branch, push upstream or let it by 
> pulled by upstream)

3. Other people want to use your new work.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:46                                                                 ` Martin Waitz
@ 2006-12-01 12:16                                                                   ` Andy Parkins
  2006-12-01 12:34                                                                     ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 12:16 UTC (permalink / raw)
  To: git; +Cc: Martin Waitz

On Friday 2006 December 01 11:46, Martin Waitz wrote:

> > That's no different from mine.
>
> Well, there simply is no proxy object inbetween.

That's fine, I was only using the proxy object to allow additional information 
into the submodule object.  Actually, I think it would always be better to 
use a proxy object otherwise you have an error in the tree object, because it 
will refer to an object that does not exist.  The proxy object is allowed to 
refer to objects that don't exist because it's not a tree object.

> > Is that commit in the submodule or the supermodule?
>
> Well, logically that commit belongs to the submodule and is referenced
> by the tree in the supermodule.
> Phyisically it is stored in the projects object database which is
> shared between the supermodule and all submodules (at least in my
> implementation).

Hmmm, "shared"?  It must still be in the submodule physically though, and 
presumably the supermodule uses alternatives to get access to it?  Otherwise 
the submodule will be impossible to separate from the supermodule.

> > I also want to know how we tell the difference between a
> > commit-in-supermodule and a
> > commit-in-supermodule-which-is-actually-in-submodule.
>
> There is no difference.

Okay.  I think I'm still a bit lost then.  I suppose I'll wait for your 
patches to understand.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:31                                                             ` Martin Waitz
@ 2006-12-01 12:20                                                               ` Andy Parkins
  2006-12-01 12:37                                                                 ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 12:20 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 11:31, Martin Waitz wrote:

> It really only stores the commit of the submodule directly.
> So there is no new submodule object type.  The parent has a direct link
> to the submodule commit in his tree object and in its index.  In order
> to separate them from normal files or normal subdirectories, they get a
> special mode: they are represented as socket.

Okay.  I think I've got it now.  I'm not convinced that the way you've chosen 
is the correct way, primarily because the separation between supermodule and 
submodule is not strong.  Regardless, as you're doing it, you get to pick :-) 
Is there a public repository I can look at to see what you've done?  I'm 
interested in the sort of plumbing changes needed to make something like this 
work.


Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:12                                                                   ` Andy Parkins
@ 2006-12-01 12:28                                                                     ` Martin Waitz
  2006-12-01 14:11                                                                       ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 12:28 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3340 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 12:12:34PM +0000, Andy Parkins wrote:
> On Friday 2006 December 01 11:10, Sven Verdoolaege wrote:
> 
> > You were proposing to create an extra object containing some random value
> > that is disconnected from the repo.
> 
> Right, I think I've finally understood what Martin (and you) are
> proposing.  You want every commit in the submodule to be propagated up
> to the supermodule as well.  Okay.
> 
> I don't think it's right, but at least I understand.

Please note that the submodule commits are not part of the supermodule
commit chain, they are part of the supermodule _tree_.

> It seems wrong because it's making commits in the supermodule that aren't 
> commits to do with that project.

Of course they are part of your project, just like all the tree and blob
objects, too.

> In my libxcb example; why should every project use libxcb in have to
> store the entire history of libxcb?

Because you want to be able to use the submodule as a repository of its
own, too.  Be able to look at its history if you want to.
Be able to merge with new versions of the submodule.
This is what distiguishes a submodule from a pure file-based import of
another project.

> When examining the supermodule history, I won't care about how libxcb
> got to the state its in, and it's just noise in the supermodule
> history.  What if I use 10 submodules, the supermodule history won't
> show you anything useful - it's just unrelated submodule commits.

Again: the submodules are part of your supermodule _tree_, not it's
commit chain.  So you won't see the submodule commits when you invoke
git-log in the supermodule.

> It gets worse, this is why I was asking for more detail: this commit
> that you're storing in the supermodule.  It's the same commit as is in
> the submodule?

It is _the_ commit from the submodule, yes.

> What would the parent commit of that commit be?  It has to be the same
> in both, because the commit-hash forces it to be.

It is the commit of the submodule, so its parents point to the submodule
history.

> > > Is that commit in the submodule or the supermodule?
> >
> > It's in BOTH.  That's why it's a *sub*module.
> 
> If it's in BOTH then the supermodule is a normal git repository.  You aren't 
> tracking the submodule, you're just including it en masse.

The submodule is part of the entire project, so yes, it is included.
And the supermodule tracks submodule development by storing references
to the submodule history that was used at that time.


Lets try to paint a little diagram:


belongint to:
/--------- supermodule -------\    /---- submodule -------\

commit -> tree +-> blob
  |            +-> tree -> ...
  |            +-----------------> commit -> tree -> ...
  v                                  |
commit -> tree +-> ...               v
  |            +-----------------> commit -> ...
  |                                  |
  |                                  v
  |                                commit -> ...
  v                                  |
commit -> tree +-> ...               v
               +-----------------> commit


Both have their independent history, but they are linked as some
submodule versions are part of the supermodule tree.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:16                                                                   ` Andy Parkins
@ 2006-12-01 12:34                                                                     ` Martin Waitz
  2006-12-01 13:59                                                                       ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 12:34 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 12:16:00PM +0000, Andy Parkins wrote:
> That's fine, I was only using the proxy object to allow additional
> information into the submodule object.  Actually, I think it would
> always be better to use a proxy object otherwise you have an error in
> the tree object, because it will refer to an object that does not
> exist.  The proxy object is allowed to refer to objects that don't
> exist because it's not a tree object.

It is exactly the aim of my implementation to not have any reference to
something that is not accessible in the supermodule repository.

> > > Is that commit in the submodule or the supermodule?
> >
> > Well, logically that commit belongs to the submodule and is referenced
> > by the tree in the supermodule.
> > Phyisically it is stored in the projects object database which is
> > shared between the supermodule and all submodules (at least in my
> > implementation).
> 
> Hmmm, "shared"?  It must still be in the submodule physically though,
> and presumably the supermodule uses alternatives to get access to it?
> Otherwise the submodule will be impossible to separate from the
> supermodule.

Yes, you can't separate it my just moving it out of the supermodule,
but you can always clone the submodule alone.

> Okay.  I think I'm still a bit lost then.  I suppose I'll wait for your
> patches to understand.

have a look at http://git.admingilde.org/tali/git.git/module2.
If you want to try it out, have a look at t/t7500-submodule.sh on how to
create submodules.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:20                                                               ` Andy Parkins
@ 2006-12-01 12:37                                                                 ` Martin Waitz
  2006-12-02 15:16                                                                   ` Jakub Narebski
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 12:37 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 12:20:42PM +0000, Andy Parkins wrote:
> Is there a public repository I can look at to see what you've done?
> I'm interested in the sort of plumbing changes needed to make
> something like this work.

link is in the mail that started this thread ;-).

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:12                                                     ` Martin Waitz
@ 2006-12-01 13:05                                                       ` sf
  2006-12-01 13:35                                                         ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: sf @ 2006-12-01 13:05 UTC (permalink / raw)
  To: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 01:09:49PM +0100, sf wrote:
>> Martin Waitz wrote:
>> ...
>> >So you not only store your submodule HEAD commit in the supermodule when 
>> >you
>> >do commit to the supermodule, it also means that your submodule HEAD
>> >will be updated when you update your supermodule.
>> 
>> Why the magic? The typical workflow in git is
>> 
>> 1. You work on a branch, i.e. edit and commit and so on.
>> 2. At some point, you decide to share the work you did on that branch 
>> (e-mail a patch, merge into another branch, push upstream or let it by 
>> pulled by upstream)
> 
> 3. Other people want to use your new work.

Sorry, if that was not obvious: You actually procceed with one of the 
options I listed in Step 2. What I wanted to state is that with git you 
do not mix up committing (which is local to your repository and your 
branch) and publishing.

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:11                                                   ` Martin Waitz
@ 2006-12-01 13:21                                                     ` sf
  2006-12-01 13:43                                                       ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: sf @ 2006-12-01 13:21 UTC (permalink / raw)
  To: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 01:03:48PM +0100, sf wrote:
>> Andreas Ericsson wrote:
>> 2. You can never prune the main (the supermodule's) repository, at least 
>> not with what git provides today.
> 
> It even already works (well, not with what git provides today, but with
> my implementation). git-prune simply walks all the submodules, too, when
> doing it's reachability analysis.
> 
> What does not work is a prune inside the submodule, because it does not
> know about all the commits used by the supermodule.

I just had a short (really short) look at your work. My impression is 
that your repository setup is much too complicated.

As I proposed elsewhere: For submodules to work you only need to allow 
commits in tree objects (that is what your implementation requires as 
well). Everything else is in the tools. Much simpler.

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:05                                                       ` sf
@ 2006-12-01 13:35                                                         ` Martin Waitz
  2006-12-01 13:43                                                           ` Andreas Ericsson
  2006-12-01 13:51                                                           ` Stephan Feder
  0 siblings, 2 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 13:35 UTC (permalink / raw)
  To: sf; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 02:05:33PM +0100, sf wrote:
> >On Fri, Dec 01, 2006 at 01:09:49PM +0100, sf wrote:
> >>Martin Waitz wrote:
> >>>So you not only store your submodule HEAD commit in the supermodule
> >>>when you do commit to the supermodule, it also means that your
> >>>submodule HEAD will be updated when you update your supermodule.
> >>
> >>Why the magic? The typical workflow in git is
> >>
> >>1. You work on a branch, i.e. edit and commit and so on.
> >>2. At some point, you decide to share the work you did on that branch 
> >>(e-mail a patch, merge into another branch, push upstream or let it by 
> >>pulled by upstream)
> >
> >3. Other people want to use your new work.
> 
> Sorry, if that was not obvious: You actually procceed with one of the 
> options I listed in Step 2. What I wanted to state is that with git you 
> do not mix up committing (which is local to your repository and your 
> branch) and publishing.

I guess you are refering to not mix up committing to the submodule and
updating the supermodule index.
These are really two separate steps, I just combined them above because I
wanted to put emphasis on the other part: it is not a one-way flow, it
is bidirectional, so your HEAD would have to changed if the supermodule
gets updated.
And I consider changing HEAD, without looking at the branch it points
to, to be a bad thing.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:21                                                     ` sf
@ 2006-12-01 13:43                                                       ` Martin Waitz
  2006-12-01 14:23                                                         ` Stephan Feder
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 13:43 UTC (permalink / raw)
  To: sf; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 750 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 02:21:20PM +0100, sf wrote:
> I just had a short (really short) look at your work. My impression is 
> that your repository setup is much too complicated.

Well, I'm not really satisfied with the UI part.
What exactly do you find complicated?

> As I proposed elsewhere: For submodules to work you only need to allow 
> commits in tree objects (that is what your implementation requires as 
> well). Everything else is in the tools. Much simpler.

I do not quite get your point.
The core of my work allows to put commits into tree objects.
Then there is some more (but not quite finished) work to make the tools
work together with submodules.  So no, not everything is there yet.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:35                                                         ` Martin Waitz
@ 2006-12-01 13:43                                                           ` Andreas Ericsson
  2006-12-01 13:46                                                             ` Martin Waitz
  2006-12-01 13:51                                                           ` Stephan Feder
  1 sibling, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-01 13:43 UTC (permalink / raw)
  To: Martin Waitz; +Cc: sf, git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 02:05:33PM +0100, sf wrote:
>>> On Fri, Dec 01, 2006 at 01:09:49PM +0100, sf wrote:
>>>> Martin Waitz wrote:
>>>>> So you not only store your submodule HEAD commit in the supermodule
>>>>> when you do commit to the supermodule, it also means that your
>>>>> submodule HEAD will be updated when you update your supermodule.
>>>> Why the magic? The typical workflow in git is
>>>>
>>>> 1. You work on a branch, i.e. edit and commit and so on.
>>>> 2. At some point, you decide to share the work you did on that branch 
>>>> (e-mail a patch, merge into another branch, push upstream or let it by 
>>>> pulled by upstream)
>>> 3. Other people want to use your new work.
>> Sorry, if that was not obvious: You actually procceed with one of the 
>> options I listed in Step 2. What I wanted to state is that with git you 
>> do not mix up committing (which is local to your repository and your 
>> branch) and publishing.
> 
> I guess you are refering to not mix up committing to the submodule and
> updating the supermodule index.
> These are really two separate steps, I just combined them above because I
> wanted to put emphasis on the other part: it is not a one-way flow, it
> is bidirectional, so your HEAD would have to changed if the supermodule
> gets updated.
> And I consider changing HEAD, without looking at the branch it points
> to, to be a bad thing.
> 

So a commit in the supermodule turns into a commit in the submodule? 
That's just plain wrong. If it doesn't, why would the submodule HEAD 
have to change?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:43                                                           ` Andreas Ericsson
@ 2006-12-01 13:46                                                             ` Martin Waitz
  2006-12-01 14:52                                                               ` Andreas Ericsson
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 13:46 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: sf, git

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 02:43:16PM +0100, Andreas Ericsson wrote:
> So a commit in the supermodule turns into a commit in the submodule? 

no.

> If it doesn't, why would the submodule HEAD have to change?

So how do you update your submodule?

Remember: if you git-pull in the supermodule, you want to update the
whole thing, including all submodules.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:35                                                         ` Martin Waitz
  2006-12-01 13:43                                                           ` Andreas Ericsson
@ 2006-12-01 13:51                                                           ` Stephan Feder
  2006-12-01 14:58                                                             ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Stephan Feder @ 2006-12-01 13:51 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 02:05:33PM +0100, sf wrote:
>> >On Fri, Dec 01, 2006 at 01:09:49PM +0100, sf wrote:
>> >>Martin Waitz wrote:
>> >>>So you not only store your submodule HEAD commit in the supermodule
>> >>>when you do commit to the supermodule, it also means that your
>> >>>submodule HEAD will be updated when you update your supermodule.
>> >>
>> >>Why the magic? The typical workflow in git is
>> >>
>> >>1. You work on a branch, i.e. edit and commit and so on.
>> >>2. At some point, you decide to share the work you did on that branch 
>> >>(e-mail a patch, merge into another branch, push upstream or let it by 
>> >>pulled by upstream)
>> >
>> >3. Other people want to use your new work.
>> 
>> Sorry, if that was not obvious: You actually procceed with one of the 
>> options I listed in Step 2. What I wanted to state is that with git you 
>> do not mix up committing (which is local to your repository and your 
>> branch) and publishing.
> 
> I guess you are refering to not mix up committing to the submodule and
> updating the supermodule index.

The opposite: If you work in the supermodule, even if it is in the code 
of the submodule, you only commit to the supermodule. The submodule does 
not "know" about these changes after step 1.

> These are really two separate steps, I just combined them above because I
> wanted to put emphasis on the other part: it is not a one-way flow, it
> is bidirectional, so your HEAD would have to changed if the supermodule
> gets updated.

Why do you mix up supermodule and submodule? The way I see your proposal 
you cannot change submodule and supermodule independently. That is a 
huge drawback.

Regards


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:34                                                                     ` Martin Waitz
@ 2006-12-01 13:59                                                                       ` Andy Parkins
  2006-12-01 14:07                                                                         ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 13:59 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 12:34, Martin Waitz wrote:

> It is exactly the aim of my implementation to not have any reference to
> something that is not accessible in the supermodule repository.

Okay - I think you've put me right in another reply on this point - the 
submodule commit is in the supermodule; that was the part I hadn't got.

> Yes, you can't separate it my just moving it out of the supermodule,
> but you can always clone the submodule alone.

Ah - now that clarifies things a lot.  The fact that you can't separate it by 
moving it implies lots of things that take away many of my earlier worries.

> have a look at http://git.admingilde.org/tali/git.git/module2.
> If you want to try it out, have a look at t/t7500-submodule.sh on how to
> create submodules.

Thanks.  I will look hard at this :-)  My apologies for bothering you so much 
with all these questions.  I just got a bit interested in it all :-)


Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:59                                                                       ` Andy Parkins
@ 2006-12-01 14:07                                                                         ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 14:07 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 01:59:58PM +0000, Andy Parkins wrote:
> My apologies for bothering you so much with all these questions.

You were not bothering me.
Those were really interesting and valid questions.
In fact, it was a long way for me to come to the implementation I have
now.  And I really did ask many of those questions to me, too.

I should really write a nice paper about all of that, I think.

> I just got a bit interested in it all :-)

Good :-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:28                                                                     ` Martin Waitz
@ 2006-12-01 14:11                                                                       ` Andy Parkins
  2006-12-01 15:12                                                                         ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 14:11 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 12:28, Martin Waitz wrote:

> > It seems wrong because it's making commits in the supermodule that aren't
> > commits to do with that project.
>
> Of course they are part of your project, just like all the tree and blob
> objects, too.

I wouldn't go as far as that; just because I use libxcb doesn't mean I want 
it's history merged with mine.  However, I think my worries are unfounded, 
your comment about being able to independently clone the libxcb tree helped 
me there.  If I've understood; while the objects themselves are stored in the 
supermodule ODB, they are still independent.  In fact, they're only in the 
supermodule tree because it's most convenient to keep them there; it sounds 
like it's very easy to strip them out again.

> It is the commit of the submodule, so its parents point to the submodule
> history.

Again, if I'm understanding, it's a bit like when you have an additional root 
in a normal git repository, for example:

 * -- * -- * -- * (project1)
       \
        * -- * -- * (project1/stable)

   * -- * -- * -- * (project2)

Then to make project2 a submodule of project1, one of the project1 trees 
simply refers to a commit in project2.

I think my original idea for how this works was correct with one minor flaw, 
and from that flaw all the other concerns flowed.  I imagined that there were 
two object databases - one for the supermodule and one for the submodule.  
The fault was that there aren't two ODBs there are two roots.  Which of 
course is a far easier way to blend to repositories.  Apart from that, I 
think I'm entirely in sync, and it was merely my wanting to put each of these 
roots in their own repository that caused all the confusion.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:43                                                       ` Martin Waitz
@ 2006-12-01 14:23                                                         ` Stephan Feder
  2006-12-01 15:07                                                           ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Stephan Feder @ 2006-12-01 14:23 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 02:21:20PM +0100, sf wrote:
>> I just had a short (really short) look at your work. My impression is 
>> that your repository setup is much too complicated.
> 
> Well, I'm not really satisfied with the UI part.
> What exactly do you find complicated?

I am not talking about the UI. On the contrary, I am talking about the 
repository, i.e. the directories and files on your disk and their 
contents. You add a lot of additional information to the repository that 
is not needed at all.

>> As I proposed elsewhere: For submodules to work you only need to allow 
>> commits in tree objects (that is what your implementation requires as 
>> well). Everything else is in the tools. Much simpler.
> 
> I do not quite get your point.
> The core of my work allows to put commits into tree objects.

That is fine.

> Then there is some more (but not quite finished) work to make the tools
> work together with submodules.  So no, not everything is there yet.

And what is already there is a lot of meta information (see above). You 
do not need that.

For example, in the index, if it is a commit (i.e. a subproject), store 
the commit id (not the commit's tree id ). Make the tools handle this 
case (as yet, all code expects only trees and blobs when they parse 
trees). Especially, extend update-index to be able to store a commit 
instead of the tree.

Or else, do not change what is recorded in the index. Then, at commit 
time, you not only commit the superproject but also all subprojects.

Or allow both.

Anyway, you can create commits in tree objects. See, you did not need to 
  store any additional information in the repository.

To push and pull you have to extend the tools as well. That is the next 
step.

Regards

Stephan

-- 
b.i.t.
beratungsgesellschaft für informations-technologie mbh
Stephan Feder
elisabethenstr. 62   fon: +49(0)6151/827575
64283 darmstadt      fax: +49(0)6151/827576
mailto:sf@b-i-t.de   www: http://www.b-i-t.de

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:46                                                             ` Martin Waitz
@ 2006-12-01 14:52                                                               ` Andreas Ericsson
  2006-12-01 15:00                                                                 ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-01 14:52 UTC (permalink / raw)
  To: Martin Waitz; +Cc: sf, git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 02:43:16PM +0100, Andreas Ericsson wrote:
>> So a commit in the supermodule turns into a commit in the submodule? 
> 
> no.
> 
>> If it doesn't, why would the submodule HEAD have to change?
> 
> So how do you update your submodule?
> 

By committing to it separately, or by getting changes from the upstream 
project (openssl, libcurl, ...).

> Remember: if you git-pull in the supermodule, you want to update the
> whole thing, including all submodules.
> 

Only if the new commits I pull into the supermodule DAG has commits 
which includes a new shapshot of the submodule, otherwise it wouldn't be 
necessary.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 13:51                                                           ` Stephan Feder
@ 2006-12-01 14:58                                                             ` Martin Waitz
  2006-12-01 15:47                                                               ` Stephan Feder
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 14:58 UTC (permalink / raw)
  To: Stephan Feder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 02:51:49PM +0100, Stephan Feder wrote:
> If you work in the supermodule, even if it is in the code of the
> submodule, you only commit to the supermodule. The submodule does not
> "know" about these changes after step 1.

I think we are using totally different definitions of "submodule".

For me a submodule is responsible for everything in or below a certain
directory.  So by definition when you change something in this
directory, you have to change it in the submodule.
You can't change the submodule contents in the supermodule without also
changing the submodule.
This is just like you can't commit a change to a file without also
changing the file.

Then the supermodule just records the current content of the entire
tree.  The only new thing is that instead of simple files there are now
submodules and that are also recorded.

> Why do you mix up supermodule and submodule? The way I see your proposal 
> you cannot change submodule and supermodule independently. That is a 
> huge drawback.

No, this is the benefit you get by introducing submodules.
Why would you want to introduce a submodule when it is not linked to the
supermodule?

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 14:52                                                               ` Andreas Ericsson
@ 2006-12-01 15:00                                                                 ` Martin Waitz
  2006-12-01 16:38                                                                   ` Andreas Ericsson
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 15:00 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: sf, git

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 03:52:29PM +0100, Andreas Ericsson wrote:
> >Remember: if you git-pull in the supermodule, you want to update the
> >whole thing, including all submodules.
> >
> 
> Only if the new commits I pull into the supermodule DAG has commits 
> which includes a new shapshot of the submodule, otherwise it wouldn't be 
> necessary.

Of course.

But if the supermodule contains changes to the submodule, you still
have to change the submodule.  And this implies changing the submodule
HEAD or some branch.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 14:23                                                         ` Stephan Feder
@ 2006-12-01 15:07                                                           ` Martin Waitz
  2006-12-01 16:04                                                             ` Stephan Feder
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 15:07 UTC (permalink / raw)
  To: Stephan Feder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 03:23:07PM +0100, Stephan Feder wrote:
> And what is already there is a lot of meta information (see above). You 
> do not need that.

What information are you refering to?
Perhaps you have looked into my old branch?
The current implementation is in "module2".

> For example, in the index, if it is a commit (i.e. a subproject), store 
> the commit id (not the commit's tree id ).

This is exactly what I have done.

> Especially, extend update-index to be able to store a commit 
> instead of the tree.

Done, except that update-index never stores trees ;-)

> Or else, do not change what is recorded in the index. Then, at commit 
> time, you not only commit the superproject but also all subprojects.

But then submodules would be handled differently to files which I wanted
to avoid.

> To push and pull you have to extend the tools as well. That is the next 
> step.

Also done.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 14:11                                                                       ` Andy Parkins
@ 2006-12-01 15:12                                                                         ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 15:12 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 765 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 02:11:19PM +0000, Andy Parkins wrote:
> If I've understood; while the objects themselves are stored in the
> supermodule ODB, they are still independent.  In fact, they're only in
> the supermodule tree because it's most convenient to keep them there;
> it sounds like it's very easy to strip them out again.

Yes.

> Again, if I'm understanding, it's a bit like when you have an
> additional root in a normal git repository, for example:
>
>  * -- * -- * -- * (project1)
>        \
>         * -- * -- * (project1/stable)
>
>    * -- * -- * -- * (project2)
> 
> Then to make project2 a submodule of project1, one of the project1
> trees simply refers to a commit in project2.

Exactly.


-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 14:58                                                             ` Martin Waitz
@ 2006-12-01 15:47                                                               ` Stephan Feder
  2006-12-01 16:54                                                                 ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Stephan Feder @ 2006-12-01 15:47 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 02:51:49PM +0100, Stephan Feder wrote:
>> If you work in the supermodule, even if it is in the code of the
>> submodule, you only commit to the supermodule. The submodule does not
>> "know" about these changes after step 1.
> 
> I think we are using totally different definitions of "submodule".

No so different. The way I see it is that "I" (meaning with submodules 
implemented as I proposed) could pull regularly from "your" repositories 
(implemented as you proposed) and work with the result (including 
submodules). Could you do the same?

> For me a submodule is responsible for everything in or below a certain
> directory.  So by definition when you change something in this
> directory, you have to change it in the submodule.

But you do not consider the case where you cannot change the submodule 
because you do not own it.

For example, git has the subproject xdiff. If git had been able to work 
with subprojects as I envision, and if xdiff had been published as a git 
repository (not necessarily subproject enabled), it could have been 
pulled in git's subdirectory xdiff as a subproject. There would not have 
been a separate branch or even repository for xdiff in the git repository.

All changes to xdiff in git could have been committed to the git 
repository only. Independently, they could have been published to 
upstream and be put into the xdiff repository by its author. But the 
last part is what only the owner of the xdiff repository is able to decide.

(Ok, ok... the example sucks badly because xdiff has been massively 
changed for its usage in git so the changes would not be integrated by 
upstream. But you can imagine where you use a library essentially as is, 
only if you discover bugs you fix them immediately in your repository 
and keep those fixes in your version of the library, even on upgrade, 
until the bugs have been fixed by upstream.)

> You can't change the submodule contents in the supermodule without also
> changing the submodule.
> This is just like you can't commit a change to a file without also
> changing the file.

There is a difference. I would say: If you commit a change to a file in 
one branch, it need not be changed in all branches.

> Then the supermodule just records the current content of the entire
> tree.  The only new thing is that instead of simple files there are now
> submodules and that are also recorded.

Yes, and that is all you need. If the changes are to be part of a branch 
of the submodule, they have to be pulled. That is an independent operation.

>> Why do you mix up supermodule and submodule? The way I see your proposal 
>> you cannot change submodule and supermodule independently. That is a 
>> huge drawback.
> 
> No, this is the benefit you get by introducing submodules.
> Why would you want to introduce a submodule when it is not linked to the
> supermodule?

Because the submodule must be independent of the supermodule.

I see where you are coming from. You have one project that is divided 
into subprojects but the subprojects themselves are not independent.

What I would like to solve is the followng: You have a project X, an 
this project is made part of two other projects Y and Z (as a submodule 
or subproject or whatever you want to call it). The project X need not, 
must not or cannot care that it was made a subproject. But in projects Y 
and Z, you must be able to bugfix or extend or modify the code of 
projectX, and you must be able to push and pull changes between all 
three projects (of course we are only talking about the code part of 
project X).

Do you see where your solution makes that impossible, and that with more 
changes to the repository layout?

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 15:07                                                           ` Martin Waitz
@ 2006-12-01 16:04                                                             ` Stephan Feder
  2006-12-01 16:15                                                               ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Stephan Feder @ 2006-12-01 16:04 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 03:23:07PM +0100, Stephan Feder wrote:
>> And what is already there is a lot of meta information (see above). You 
>> do not need that.
> 
> What information are you refering to?
> Perhaps you have looked into my old branch?
> The current implementation is in "module2".

I was looking into git-init-module.sh (branch module2). There you set up 
a separate git repository for the submodule and store references to it 
into the supermodules's repository.

> 
>> For example, in the index, if it is a commit (i.e. a subproject), store 
>> the commit id (not the commit's tree id ).
> 
> This is exactly what I have done.
> 
>> Especially, extend update-index to be able to store a commit 
>> instead of the tree.
> 
> Done, except that update-index never stores trees ;-)

Yes, I forgot.

>> Or else, do not change what is recorded in the index. Then, at commit 
>> time, you not only commit the superproject but also all subprojects.
> 
> But then submodules would be handled differently to files which I wanted
> to avoid.

On the other hand, it feels more naturally to only commit at the end of 
your work. So both alternatives have their merits.

> 
>> To push and pull you have to extend the tools as well. That is the next 
>> step.
> 
> Also done.

I hope I have time to give your solution a try.

Regards

Stephan

-- 
b.i.t.
beratungsgesellschaft für informations-technologie mbh
Stephan Feder
elisabethenstr. 62   fon: +49(0)6151/827575
64283 darmstadt      fax: +49(0)6151/827576
mailto:sf@b-i-t.de   www: http://www.b-i-t.de

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:04                                                             ` Stephan Feder
@ 2006-12-01 16:15                                                               ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 16:15 UTC (permalink / raw)
  To: Stephan Feder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 05:04:19PM +0100, Stephan Feder wrote:
> I was looking into git-init-module.sh (branch module2). There you set up 
> a separate git repository for the submodule and store references to it 
> into the supermodules's repository.

yes.

This is to be able to call git-fsck-objects and git-prune in the
toplevel supermodule.  When traversing the object tree, it already knows
about all submodule, but only about those versions that are really part
of their supermodule.
So I have to teach git about separate submodule branches which may be
used, in order not to prune them away.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 15:00                                                                 ` Martin Waitz
@ 2006-12-01 16:38                                                                   ` Andreas Ericsson
  2006-12-01 16:49                                                                     ` Linus Torvalds
  2006-12-01 16:57                                                                     ` Martin Waitz
  0 siblings, 2 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-01 16:38 UTC (permalink / raw)
  To: Martin Waitz; +Cc: sf, git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 03:52:29PM +0100, Andreas Ericsson wrote:
>>> Remember: if you git-pull in the supermodule, you want to update the
>>> whole thing, including all submodules.
>>>
>> Only if the new commits I pull into the supermodule DAG has commits 
>> which includes a new shapshot of the submodule, otherwise it wouldn't be 
>> necessary.
> 
> Of course.
> 
> But if the supermodule contains changes to the submodule, you still
> have to change the submodule.  And this implies changing the submodule
> HEAD or some branch.
> 

Not really. I fail to see why HEAD needs to be changed so long as the 
commit is in the submodule's odb.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:38                                                                   ` Andreas Ericsson
@ 2006-12-01 16:49                                                                     ` Linus Torvalds
  2006-12-01 17:08                                                                       ` sf
  2006-12-01 17:14                                                                       ` Martin Waitz
  2006-12-01 16:57                                                                     ` Martin Waitz
  1 sibling, 2 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-01 16:49 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Martin Waitz, sf, git

On Fri, 1 Dec 2006, Andreas Ericsson wrote:

> Martin Waitz wrote:
> > 
> > But if the supermodule contains changes to the submodule, you still
> > have to change the submodule.  And this implies changing the submodule
> > HEAD or some branch.
> > 
> 
> Not really. I fail to see why HEAD needs to be changed so long as the commit
> is in the submodule's odb.

Right. A commit in the supermodule should _not_ imply a commit in the 
submodule.

Maybe I should take a look at the code, but it sounds like people are 
still trying to "mix" submodules too much. 

Think of it this way: one common use for submodules is really to just 
(occasionally) track somebody elses code. The submodule should be a 
totally pristine copy from somebody else (ie it might be the "intel driver 
for X.org" submodule, maintained within intel), and the supermodule just 
refers to it indirectly (ie the supermodule might be the "Fedora Core X 
group" which contains all the different drivers from different people).

So anything that mixes super-modules and sub-modules too much will always 
break this kind of model.

A supermodule can never "contain changes" to a submodule. A supermodule 
would always just point to the submodule, and not have any changes 
what-so-ever of its own. The submodule is self-sufficient, and always 
contains all its _own_ changes.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 15:47                                                               ` Stephan Feder
@ 2006-12-01 16:54                                                                 ` Martin Waitz
  2006-12-01 17:33                                                                   ` Stephan Feder
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 16:54 UTC (permalink / raw)
  To: Stephan Feder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4870 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 04:47:47PM +0100, Stephan Feder wrote:
> No so different. The way I see it is that "I" (meaning with submodules 
> implemented as I proposed) could pull regularly from "your" repositories 
> (implemented as you proposed) and work with the result (including 
> submodules). Could you do the same?

Sorry, but with all that many people proposing things I am a bit lost
now.  Sometimes I thought you want exactly the same thing as I do,
sometimes I think we are talking in totally different directions.

> >For me a submodule is responsible for everything in or below a certain
> >directory.  So by definition when you change something in this
> >directory, you have to change it in the submodule.
> 
> But you do not consider the case where you cannot change the submodule 
> because you do not own it.

I do not understand you here.
The submodule is part of the supermodule, and the one who sets up the
repository owns the whole thing, including all submodules, just like all
the files which are part of the project.

If you mean the upstream repository of the submodule, then yes, this is
of course completely separated from the submodule and may be owned by
someone else.  Consequently, this upstream repository of course does not
need to change when someone introduces changes in the supermodule.

> For example, git has the subproject xdiff. If git had been able to work 
> with subprojects as I envision, and if xdiff had been published as a git 
> repository (not necessarily subproject enabled), it could have been 
> pulled in git's subdirectory xdiff as a subproject.

This could have been done if submodule support would have been available
at the time xdiff was introduced, yes.

> There would not have been a separate branch or even repository for
> xdiff in the git repository.

What separate branch or repository are you talking about?

> All changes to xdiff in git could have been committed to the git 
> repository only.

Yes, but if it would have been integrated as a submodule it obviously
would have been committed to the xdiff submodule inside the git
repository.
So the changes are really part of the git repository, but you could go
to the "git/xdiff" directory and only see the changes in the submodule,
without the normal supermodule history.

> Independently, they could have been published to upstream and be put
> into the xdiff repository by its author.  But the last part is what
> only the owner of the xdiff repository is able to decide.

Of course, everything still works like normal git repositories.

> >You can't change the submodule contents in the supermodule without also
> >changing the submodule.
> >This is just like you can't commit a change to a file without also
> >changing the file.
> 
> There is a difference. I would say: If you commit a change to a file in 
> one branch, it need not be changed in all branches.

But you need to change _at_least_ one branch.
Otherwise you cannot commit to a branch.

So if you change something in a submodule, you have to change one branch
in the submodule.
If you call git-checkout in the supermodule this will result in
something like a git-reset in the submodule.

> >No, this is the benefit you get by introducing submodules.
> >Why would you want to introduce a submodule when it is not linked to the
> >supermodule?
> 
> Because the submodule must be independent of the supermodule.
> 
> I see where you are coming from. You have one project that is divided 
> into subprojects but the subprojects themselves are not independent.
> 
> What I would like to solve is the followng: You have a project X, an 
> this project is made part of two other projects Y and Z (as a submodule 
> or subproject or whatever you want to call it). The project X need not, 
> must not or cannot care that it was made a subproject. But in projects Y 
> and Z, you must be able to bugfix or extend or modify the code of 
> projectX, and you must be able to push and pull changes between all 
> three projects (of course we are only talking about the code part of 
> project X).

Of course.

So if you wanted to check out everything, you could have something like
~/src/X, ~/src/Y/X, and ~/src/Z/X.
All of these would be GIT repositories, all of them have their
independent branches.

What I am saying is just that if you update Y, and the new Y contains an
updated version of X, then ~/src/Y/X/.git/refs/heads/master will be
changed by the pull, resulting in the new version of X being checked out
in ~/src/Y/X (alongside all the other updates inside ~/src/Y).
This of course is independend from ~/src/X or  ~/src/Z/X.

> Do you see where your solution makes that impossible, and that with more 
> changes to the repository layout?

No ;-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:38                                                                   ` Andreas Ericsson
  2006-12-01 16:49                                                                     ` Linus Torvalds
@ 2006-12-01 16:57                                                                     ` Martin Waitz
  2006-12-01 18:08                                                                       ` Andreas Ericsson
  1 sibling, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 16:57 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: sf, git

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

On Fri, Dec 01, 2006 at 05:38:44PM +0100, Andreas Ericsson wrote:
> >But if the supermodule contains changes to the submodule, you still
> >have to change the submodule.  And this implies changing the submodule
> >HEAD or some branch.
> >
> 
> Not really. I fail to see why HEAD needs to be changed so long as the 
> commit is in the submodule's odb.

Because I want the submodule to act as a normal git repository.
Please note that I also voted against changing HEAD directly, but that
the new commit which came from the supermodule is just stored in one
branch of the submodule, as part of the supermodule checkout.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:49                                                                     ` Linus Torvalds
@ 2006-12-01 17:08                                                                       ` sf
  2006-12-01 18:06                                                                         ` Andreas Ericsson
  2006-12-01 20:13                                                                         ` Linus Torvalds
  2006-12-01 17:14                                                                       ` Martin Waitz
  1 sibling, 2 replies; 252+ messages in thread
From: sf @ 2006-12-01 17:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Waitz, git

Linus Torvalds wrote:
...
 > Think of it this way: one common use for submodules is really to just
 > (occasionally) track somebody elses code. The submodule should be a
 > totally pristine copy from somebody else (ie it might be the "intel 
driver
 > for X.org" submodule, maintained within intel), and the supermodule just
 > refers to it indirectly (ie the supermodule might be the "Fedora Core X
 > group" which contains all the different drivers from different people).

Could you please be a little bit more specific about how you would store 
the "pristine copy". There seems to be some agreement to store the 
commit id of the submodule instead of a plain tree id in the 
supermodules tree object, and that all objects that are reachable from 
this commit are made part of the supermodule repository (either fetched 
or via alternates). Do you agree?

...
 > A supermodule can never "contain changes" to a submodule. A supermodule
 > would always just point to the submodule, and not have any changes
 > what-so-ever of its own. The submodule is self-sufficient, and always
 > contains all its _own_ changes.

That is one of the points Martin Waitz and I are discussing.

If I understand you correctly you cannot make any changes to the 
submodules code _in the supermodule's repository_, no bugfixes, no 
extensions, no adaptions, nothing. Do you mean that?

That would be a third alternative. In my opinion the usefulness of 
submodules would be unnecessarily restricted if it comes to the choice 
of either using the code from upstream as is or do not use submodules at 
all. What is the point of the restriction?

Regards

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:49                                                                     ` Linus Torvalds
  2006-12-01 17:08                                                                       ` sf
@ 2006-12-01 17:14                                                                       ` Martin Waitz
  1 sibling, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 17:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Ericsson, sf, git

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 08:49:20AM -0800, Linus Torvalds wrote:
> Think of it this way: one common use for submodules is really to just 
> (occasionally) track somebody elses code. The submodule should be a 
> totally pristine copy from somebody else (ie it might be the "intel driver 
> for X.org" submodule, maintained within intel), and the supermodule just 
> refers to it indirectly (ie the supermodule might be the "Fedora Core X 
> group" which contains all the different drivers from different people).

Yes, but it is not only about tracking, also about distributing
submodules.

One Fedora X developer fixes a bug in the intel driver, commits that to
the submodule and then updates the supermodule to the new version (by
calling "git-update-index drivers/intel && git-commit" or something).  Then
another Feora X developer updates his X repository.  By pulling the
supermodule he also gets a new version of the submodule.
And this new version of the submodule is stored in a branch which can be
accessed by the submodule.

> A supermodule can never "contain changes" to a submodule.

The supermodule always contains _the_entire_ submodule with its complete
history, so it also does contain changes.  But it does not per-se
contain changes, only indirectly (i.e. the commits in the submodule are
not part of the supermodule commit chain).

> A supermodule would always just point to the submodule, and not have
> any changes what-so-ever of its own. The submodule is self-sufficient,
> and always contains all its _own_ changes.

Yes.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:54                                                                 ` Martin Waitz
@ 2006-12-01 17:33                                                                   ` Stephan Feder
  2006-12-01 18:48                                                                     ` Martin Waitz
                                                                                       ` (2 more replies)
  0 siblings, 3 replies; 252+ messages in thread
From: Stephan Feder @ 2006-12-01 17:33 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 04:47:47PM +0100, Stephan Feder wrote:
>> No so different. The way I see it is that "I" (meaning with submodules 
>> implemented as I proposed) could pull regularly from "your" repositories 
>> (implemented as you proposed) and work with the result (including 
>> submodules). Could you do the same?
> 
> Sorry, but with all that many people proposing things I am a bit lost
> now.  Sometimes I thought you want exactly the same thing as I do,
> sometimes I think we are talking in totally different directions.

We are in agreement about two fundamental parts of the implementation 
and their meaning:

1. A submodule is stored as a commit id in a tree object.

2. Every object that is reachable from the submodule's commit are 
reachable from the supermodule's repository.

Please confirm.

We now argue about how to work with that repository _object_ model.

>> >For me a submodule is responsible for everything in or below a certain
>> >directory.  So by definition when you change something in this
>> >directory, you have to change it in the submodule.
>> 
>> But you do not consider the case where you cannot change the submodule 
>> because you do not own it.
> 
> I do not understand you here.
> The submodule is part of the supermodule, and the one who sets up the
> repository owns the whole thing, including all submodules, just like all
> the files which are part of the project.

If you mean by "owns the whole thing" what I stated above in 2. the we 
agree.

> If you mean the upstream repository of the submodule, then yes, this is
> of course completely separated from the submodule and may be owned by
> someone else.  Consequently, this upstream repository of course does not
> need to change when someone introduces changes in the supermodule.

I think we still agree.

>> For example, git has the subproject xdiff. If git had been able to work 
>> with subprojects as I envision, and if xdiff had been published as a git 
>> repository (not necessarily subproject enabled), it could have been 
>> pulled in git's subdirectory xdiff as a subproject.
> 
> This could have been done if submodule support would have been available
> at the time xdiff was introduced, yes.
> 
>> There would not have been a separate branch or even repository for
>> xdiff in the git repository.
> 
> What separate branch or repository are you talking about?

That's it: There is no need for a separate branch or repository. If you 
have the subproject's commit in the superproject's object database (and 
we really have that, see 1. and 2. above), why do you _have to_ store it 
elsewhere?

>> All changes to xdiff in git could have been committed to the git 
>> repository only.
> 
> Yes, but if it would have been integrated as a submodule it obviously
> would have been committed to the xdiff submodule inside the git
> repository.

No. The xdiff submodule would only exist as part of the git repository. 
You could, f.e., access the xdiff commit in git HEAD as HEAD:xdiff// 
(again my proposed syntax). HEAD:xdiff//~2:xemit.c would give you the 
grandparent of xemit.c in the xdiff submodule. And so on. You can even 
have submodules that have themselves submodules.

> So the changes are really part of the git repository, but you could go
> to the "git/xdiff" directory and only see the changes in the submodule,
> without the normal supermodule history.

See above.

>> Independently, they could have been published to upstream and be put
>> into the xdiff repository by its author.  But the last part is what
>> only the owner of the xdiff repository is able to decide.
> 
> Of course, everything still works like normal git repositories.

OK.

>> >You can't change the submodule contents in the supermodule without also
>> >changing the submodule.
>> >This is just like you can't commit a change to a file without also
>> >changing the file.
>> 
>> There is a difference. I would say: If you commit a change to a file in 
>> one branch, it need not be changed in all branches.
> 
> But you need to change _at_least_ one branch.
> Otherwise you cannot commit to a branch.

But only the supermodule's branch.

> So if you change something in a submodule, you have to change one branch
> in the submodule.

No.

> If you call git-checkout in the supermodule this will result in
> something like a git-reset in the submodule.

If you mean the submodule repository created by init-module I 
understand. But why create this "helper repository at all"?

>> >No, this is the benefit you get by introducing submodules.
>> >Why would you want to introduce a submodule when it is not linked to the
>> >supermodule?
>> 
>> Because the submodule must be independent of the supermodule.
>> 
>> I see where you are coming from. You have one project that is divided 
>> into subprojects but the subprojects themselves are not independent.
>> 
>> What I would like to solve is the followng: You have a project X, an 
>> this project is made part of two other projects Y and Z (as a submodule 
>> or subproject or whatever you want to call it). The project X need not, 
>> must not or cannot care that it was made a subproject. But in projects Y 
>> and Z, you must be able to bugfix or extend or modify the code of 
>> projectX, and you must be able to push and pull changes between all 
>> three projects (of course we are only talking about the code part of 
>> project X).
> 
> Of course.
> 
> So if you wanted to check out everything, you could have something like
> ~/src/X, ~/src/Y/X, and ~/src/Z/X.
> All of these would be GIT repositories, all of them have their
> independent branches.
> 
> What I am saying is just that if you update Y, and the new Y contains an
> updated version of X, then ~/src/Y/X/.git/refs/heads/master will be
> changed by the pull, resulting in the new version of X being checked out
> in ~/src/Y/X (alongside all the other updates inside ~/src/Y).
> This of course is independend from ~/src/X or  ~/src/Z/X.
> 
>> Do you see where your solution makes that impossible, and that with more 
>> changes to the repository layout?
> 
> No ;-)
> 

Sorry, have to leave for home so I must leave that uncommented. 
Hopefully I can join in during the weekend.

Regards


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 17:08                                                                       ` sf
@ 2006-12-01 18:06                                                                         ` Andreas Ericsson
  2006-12-01 20:13                                                                         ` Linus Torvalds
  1 sibling, 0 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-01 18:06 UTC (permalink / raw)
  To: sf; +Cc: git, Martin Waitz

sf wrote:
> 
> That is one of the points Martin Waitz and I are discussing.
> 
> If I understand you correctly you cannot make any changes to the 
> submodules code _in the supermodule's repository_, no bugfixes, no 
> extensions, no adaptions, nothing. Do you mean that?
> 
> That would be a third alternative. In my opinion the usefulness of 
> submodules would be unnecessarily restricted if it comes to the choice 
> of either using the code from upstream as is or do not use submodules at 
> all. What is the point of the restriction?
> 

That depends on your definition of submodule. In my eyes, a submodule is 
a separate repo that can be committed to separately (and generally also 
built separately), although it's usually built into something else. I'm 
imagining most submodules will contain only library code and its testing 
routines.

Insofar as I've envisioned submodules, it's a separate git repo where 
you simply record a certain snapshot of the sub-repo with a commit in 
the super-module, like so:

$ git commit ssl-functions/*.[ch] openssl -m "Upgraded openssl with 
necessary changes to core code"

(yes, I know it's horrid to use -m to commit, and I daily advocate 
against it where I work, but you get the idea, I'm sure)

Isn't this how it's supposed to work? Enlighten me, and please remember 
that I'm drunk atm, so make it obvious ;-)

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 16:57                                                                     ` Martin Waitz
@ 2006-12-01 18:08                                                                       ` Andreas Ericsson
  2006-12-01 18:51                                                                         ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-01 18:08 UTC (permalink / raw)
  To: Martin Waitz; +Cc: sf, git



Martin Waitz wrote:
> On Fri, Dec 01, 2006 at 05:38:44PM +0100, Andreas Ericsson wrote:
>>> But if the supermodule contains changes to the submodule, you still
>>> have to change the submodule.  And this implies changing the submodule
>>> HEAD or some branch.
>>>
>> Not really. I fail to see why HEAD needs to be changed so long as the 
>> commit is in the submodule's odb.
> 
> Because I want the submodule to act as a normal git repository.
> Please note that I also voted against changing HEAD directly, but that
> the new commit which came from the supermodule is just stored in one
> branch of the submodule, as part of the supermodule checkout.
> 

You're assuming the super- and sub-module will share HEAD, or at least 
ODB, I think. I'm not convinced this is necessary. Convince me. I'll go 
drink bear and get some dancing done while you're at it ;-)

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 17:33                                                                   ` Stephan Feder
@ 2006-12-01 18:48                                                                     ` Martin Waitz
  2006-12-01 23:34                                                                       ` sf
  2006-12-01 19:17                                                                     ` Andy Parkins
  2006-12-02 13:08                                                                     ` Jakub Narebski
  2 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 18:48 UTC (permalink / raw)
  To: Stephan Feder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3581 bytes --]

On Fri, Dec 01, 2006 at 06:33:12PM +0100, Stephan Feder wrote:
> We are in agreement about two fundamental parts of the implementation 
> and their meaning:
> 
> 1. A submodule is stored as a commit id in a tree object.
> 
> 2. Every object that is reachable from the submodule's commit are 
> reachable from the supermodule's repository.

Correct.

> >>For example, git has the subproject xdiff. If git had been able to work 
> >>with subprojects as I envision, and if xdiff had been published as a git 
> >>repository (not necessarily subproject enabled), it could have been 
> >>pulled in git's subdirectory xdiff as a subproject.
> >
> >This could have been done if submodule support would have been available
> >at the time xdiff was introduced, yes.
> >
> >>There would not have been a separate branch or even repository for
> >>xdiff in the git repository.
> >
> >What separate branch or repository are you talking about?
> 
> That's it: There is no need for a separate branch or repository. If you 
> have the subproject's commit in the superproject's object database (and 
> we really have that, see 1. and 2. above), why do you _have to_ store it 
> elsewhere?

Let's see if I understand you correctly:

You don't want to create an additional .git directory for the submodule
and just handle everything with one toplevel .git repository for the
whole project.
Without the .git directory, you of course do not have refs/heads inside
the submodule.

So this is a different user-interface approach to submodules when
compared to my approach.  But the basis is the same and both could
inter-operate.

Now your submodule is no longer seen as an independent git repository
and I think this would cause problems when you want to push/pull between
the submodule and its upstream repository.
No technical problems, but UI-problems because now your submodule is
handled completly different to a "normal" repository.

> >Yes, but if it would have been integrated as a submodule it obviously
> >would have been committed to the xdiff submodule inside the git
> >repository.
> 
> No. The xdiff submodule would only exist as part of the git repository. 

But you could still call the "xdiff" part of the git repository a
submodule.  And then changes to the xdiff directory result in a new
submodule commit, even when there is no direct reference to it.
So you'd still "commit to the xdiff submodule".

> You could, f.e., access the xdiff commit in git HEAD as HEAD:xdiff// 
> (again my proposed syntax). HEAD:xdiff//~2:xemit.c would give you the 
> grandparent of xemit.c in the xdiff submodule.

git-cat-file commit HEAD:xdiff already works out of the box (even
cat-file tree to get the submodule tree).  But up to now revision
parsing follows the file name only once.

What about just separating things with "/"?

commit HEAD
tree   HEAD/
blob   HEAD/Makefile
commit HEAD/xdiff
tree   HEAD/xdiff/
blob   HEAD/xdiff~2/xemit.c

this may add some confusion when used with hierarchical branches, but
it's still unique:

	refs/heads/master/xdiff/xemit.c

Just use as many path components until a matching reference is found,
then start peeling.
Or just use / between super and submodule:

	refs/heads/master:xdiff/xemit.c

I think this is easier to read then

	refs/heads/master:xdiff//:xemit.c

> If you mean the submodule repository created by init-module I 
> understand. But why create this "helper repository at all"?

Because it helps "normal" git operations ;-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 18:08                                                                       ` Andreas Ericsson
@ 2006-12-01 18:51                                                                         ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 18:51 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: sf, git

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 07:08:12PM +0100, Andreas Ericsson wrote:
> You're assuming the super- and sub-module will share HEAD, or at least 
> ODB, I think.

not HEAD, only ODB.

> I'm not convinced this is necessary. Convince me. I'll go 
> drink bear and get some dancing done while you're at it ;-)

Get me a beer and I will convince you :-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 17:33                                                                   ` Stephan Feder
  2006-12-01 18:48                                                                     ` Martin Waitz
@ 2006-12-01 19:17                                                                     ` Andy Parkins
  2006-12-01 19:38                                                                       ` Martin Waitz
  2006-12-02 13:08                                                                     ` Jakub Narebski
  2 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 19:17 UTC (permalink / raw)
  To: git, sf; +Cc: Martin Waitz

On Friday 2006, December 01 17:33, Stephan Feder wrote:

> 1. A submodule is stored as a commit id in a tree object.
>
> 2. Every object that is reachable from the submodule's commit are
> reachable from the supermodule's repository.

I'm still not convinced about 2.  Why should any of the submodule commits be 
in the supermodule repository?  I know that is what you've implemented, but 
it still feels like too much of a blending of the submodule into the 
supermodule.

In fact, why should the submodule commits be even visible in the supermodule?  
That tree->submodule commit is sufficient; there isn't any need to view 
submodule history in the supermodule.



Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 19:17                                                                     ` Andy Parkins
@ 2006-12-01 19:38                                                                       ` Martin Waitz
  2006-12-01 21:04                                                                         ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 19:38 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, sf

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 07:17:17PM +0000, Andy Parkins wrote:
> In fact, why should the submodule commits be even visible in the
> supermodule?  That tree->submodule commit is sufficient; there isn't
> any need to view submodule history in the supermodule.

Well, but there is a need for a common object traversal.
You need that when sending all objects between two supermodule versions
and also when you determine which objects are still reachable.

The easiest way to implement the common object traversal is to have all
objects in one object repository.

It may be possible to use two object stores and still do the common
object traversal but I do not think that gives you any benefits.
You still don't have a totally separated repository then, because
you can't do a reachability analysis in the submodule repository alone.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 17:08                                                                       ` sf
  2006-12-01 18:06                                                                         ` Andreas Ericsson
@ 2006-12-01 20:13                                                                         ` Linus Torvalds
  2006-12-01 20:30                                                                           ` Martin Waitz
                                                                                             ` (3 more replies)
  1 sibling, 4 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-01 20:13 UTC (permalink / raw)
  To: sf; +Cc: git, Martin Waitz

On Fri, 1 Dec 2006, sf wrote:
>
> Linus Torvalds wrote:
> ...
> > Think of it this way: one common use for submodules is really to just
> > (occasionally) track somebody elses code. The submodule should be a
> > totally pristine copy from somebody else (ie it might be the "intel driver
> > for X.org" submodule, maintained within intel), and the supermodule just
> > refers to it indirectly (ie the supermodule might be the "Fedora Core X
> > group" which contains all the different drivers from different people).
> 
> Could you please be a little bit more specific about how you would store the
> "pristine copy".

Note that it's not necessarily "pristine", since the submodule clearly is 
a local git repository in its own right. So like _any_ git repository, you 
can (and may well end up) having your own local branches in the submodule, 
with your own local modifications.

So I'm not claiming that a submodule must always match some external git 
tree 100%, and that it must be read-only or anything like that. I'm just 
saying that I suspect that quite often, one of the MOST IMPORTANT parts is 
that the submodule is really something that somebody else technically 
maintains, and that this is actually one of the _reasons_ why it is a 
submodule in the first place. 

For example, a lot of projects end up having some kind of "library 
component" as a submodule. Take something like a video player project, 
which would have something like ffmpeg as a submodule, not because you'd 
maintain ffmpeg yourself, but simply because (let's say) the library 
interface changes enough, or you need a specific version with some of your 
own fixes that haven't been released widely yet, so you want to carry all 
the libraries you need _with_ you, even though you don't really maintain 
that submodule. You at most have some small extensions of your own.

Now, in this situation, it's relaly really _important_ that the submodule 
really is totally independent of the supermodule, for several reasons.

For example, since you don't "really" own that project, carrying around 
your own fixes is really really painful. We know it happens all the time, 
and a lot of projects end up needing their own version, but the _last_ 
thing you want is to be in merge hell all the time. So as a supermodule 
maintainer, the best possible thing for you is to be able to push back 
those local changes to the original project maintainer, so that you 
_don't_ have to maintain your own changes.

But you need to realize that the real maintainer of the submodule is 
TOTALLY UNINTERESTED in your supermodule. He's not going to maintain it, 
and in fact, if you have anything in the submodule that ends up talking 
about your supermodule, that's just going to make it a lot less likely 
that the upstream maintainer will ever pull your changes. He might take a 
diff from you, but in a perfect world, you'd actually be able to tell him: 

 "Hey, I've got a git repository with a few fixes to your ffmpeg git tree, 
  please pull from git://myhost.com/submodule.git to get these fixes:

	... explanation of fixes and commits that are relevant to
	ffmpeg, and have nothing to do with the supermodule, except
	that you need those bug-fixes because you _use_ ffmpeg ...

  Thanks"

See?

So this is why it's really important that the submodule really is a git 
repository in its own right, and why committing stuff in the supermodule 
NEVER affect the submodule itself directly (it might _cause_ you to also 
do a commit in the submodule indirectly, but the submodule commit MUST be 
totally independent, and stand on its own).

Now, you don't _have_ to push things upstream, of course. You can always 
just maintain your own submodule branch, and every once in a while, inside 
the submodule, you do

	# fetch the development in the origin/master branch
	git fetch submodule-origin origin/master

	# rebase our own special magic sauce on top of that
	git rebase origin/master

to update your submodule, and _then_ you do a commit in the supermodule 
(after testing that the update is all ok, of course) which will update the 
"commit" pointer in the supermodule.

Notice? In this example, we really maintained the submodule AS a 
submodule. It was independent, but tied into the supermodule, so that when 
we clone the supermodule, or do things like bisection on a supermodule, we 
always end up cloning the submodule too (and in the case of bisection, we 
really only bisect the supermodule, but the submodule always gets 
"tracked" in the sense that we would always check out the state of the 
submodule that was appropriate for that particular commit in the 
supermodule).

> There seems to be some agreement to store the commit id of
> the submodule instead of a plain tree id in the supermodules tree object, and
> that all objects that are reachable from this commit are made part of the
> supermodule repository (either fetched or via alternates). Do you agree?

Well, I would actually argue that you may often want to have a supermodule 
and then at least have the _option_ to decide to not fetch all the 
submodules.

For an example of this kind of usage, let me tell you how we operated at 
Transmeta a few years ago, which I'm not saying is the _only_ way to 
operate, but it's ONE way to do it, and I'll also explain _why_ we did it, 
and why we had submodules.

In the case of transmeta, we had our own tools, our own programs, and we 
"owned" all of those. We _also_ used a lot of external tools, like gcc 
etc. However, different people worked on different parts, and if you 
worked on the actual x86 JIT part, you probably didn't want to have all of 
the gcc stuff in your tree _too_. That just took a lot of space, and you 
really didn't want to compile the whole toolchain (which took hours), 
since there were precompiled binaries readily available.

Still, from a _release_ standpoint, when we released a new binary, that 
binary very much depended not just on the actual JIT sources, but on the 
whole toolchain. So if you wanted to be able to re-create a release, you 
really needed _everything_. You couldn't just take the "current version" 
of the toolchain, you needed to have the toolchain that was used AT THE 
TIME OF THE RELEASE.

And this is a _classic_ example of when you'd want to use submodules. 
Notice how everybody wanted _some_ of the submodules, but really only the 
release people wanted them _all_. The higher up the chain you were, the 
less likely you were to really want to muck around with the compiler and 
the linker, for example. 

And nobody really owned all modules. 

So what you really want is:

 - a supermodule maintainer that is not really the maintainer of _any_ of 
   the submodules, but that does the main "build world" infrastructure 
   (and generally would tend to also maintain the source control 
   infrastructure itself)

 - submodules that had their own maintainers, and where the maintainers 
   may or may not have wanted the supermodule, but even when they wanted 
   the supermodule, they might not want _all_ of the submodules, simply 
   because they just didn't care.

 - some of the submodules then have _upstream_ sources that were totally 
   independent, and that you would want to track, but you had zero power 
   AT ALL over them, and yet you migt well want to push back at least some 
   of the fixes you did - at least the ones that made sense even outside 
   your own project - just to avoid having to maintain a _huge_ set of 
   internal patches.

So no, I don't think the supermodule should even _force_ people to always 
get all the submodules. It migth be the default case, but at the same 
time, it's just being polite to let users decide on their own whether they 
really want _all_ of the build infrastructure sources.

> If I understand you correctly you cannot make any changes to the submodules
> code _in the supermodule's repository_, no bugfixes, no extensions, no
> adaptions, nothing. Do you mean that?

Yes. I think you should make all changes _within_ the submodule, because 
the submodule should still be an independent git tree in its own right.

But obviously, you'd often use a private _branch_ in the submodule beause 
you end up having whatever private extensions. That's always true: we 
always have the "master" branch that is kind of the default "private 
branch" for any repository, but obviously that is often extended upon, and 
you may have several private branches. 

For example, after you've done a big update (from some external upstream 
source) in the submodule that you are using, you migth decide that you do 
all the work on that new big update in a _new_ private branch within the 
submodule - and get the submodule changes all squared away on its own 
_before_ you then decide to commit the end result (the tip of that new 
private branch) within the supermodule.

Ie, you very much should be able to to do

	git clone supermodule/that/one/submodule my-own-version-of-submodule

to clone a submodule _without_ getting anything else (but still get all 
the work you did within he submodule - very much including your own 
private branch work).

And the importance of keeping the submodule independent is partly just 
stability and sanity, but partly also scalability. For example, the 
"index" in a supermodule should NOT include the indexes of all the 
submodules. That's really important, because the index doesn't really 
scale. Things do slow down with large indexes. 

For example, git can handle tens of thousands of files easily. I suspect 
it scales well to hundreds of thousands of filenames. But with 
supermodules, you really can end up in the situation where you have _tens_ 
of these submodules, maybe even hundreds. And if you try to maintain one 
unified index for the _whole_ thing, I guarantee you that you'll start 
feeling the pain. Indexing millions of files is just not going to be 
pretty.

So just from a git stability and scalability point, it's important to keep 
subprojects _separate_. There is obviously integration stuff, but they 
should still be seen as truly independent projects. Even the supermodule 
should have clearly its own life even _regardless_ of submodules, because 
(as I said) quite often you may want the supermodule, but you don't want 
to have _all_ of the submodules.

But it's more than that stability and scalability thing too - keeping them 
separate is what allows you to do pulls and pushes on an individual 
subproject basis, and have people really work at that level. For example, 
if you're the compiler guy at a company, you really do want to work with 
other compiler people _outside_ the company, but you sure as hell may not 
be able to give them access to your supermodule. But you may want to work 
on _just_ the compiler parts (or at least share some branches in public), 
which means that the subproject really has to be able to work 
_independently_ of the supermodule.

So "independent" here is really key, for several reasons. And that all 
means, for example, that here must NEVER be any "backpointers". A 
subproject really can _never_ have backpointers to the superproject, 
because that fundamentally means that the above kind of "compiler guy 
works on the compiler subproject in public" cannot work, if your 
supermodule isn't public.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 20:13                                                                         ` Linus Torvalds
@ 2006-12-01 20:30                                                                           ` Martin Waitz
  2006-12-01 23:23                                                                             ` Alan Chandler
  2006-12-01 22:06                                                                           ` Josef Weidendorfer
                                                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git

[-- Attachment #1: Type: text/plain, Size: 110 bytes --]

hoi :)

Linus, you are a lot better in describing all my thoughts than I myself.
;-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 19:38                                                                       ` Martin Waitz
@ 2006-12-01 21:04                                                                         ` Andy Parkins
  2006-12-01 21:37                                                                           ` Martin Waitz
  2006-12-02 13:14                                                                           ` Jakub Narebski
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 21:04 UTC (permalink / raw)
  To: git

On Friday 2006, December 01 19:38, Martin Waitz wrote:

> On Fri, Dec 01, 2006 at 07:17:17PM +0000, Andy Parkins wrote:
> > In fact, why should the submodule commits be even visible in the
> > supermodule?  That tree->submodule commit is sufficient; there isn't
> > any need to view submodule history in the supermodule.
>
> Well, but there is a need for a common object traversal.
> You need that when sending all objects between two supermodule versions
> and also when you determine which objects are still reachable.

No you don't; when traversing the supermodule history you will come across 
trees that have submodule commit hashes in them, that is all the other end 
needs to know.  If it wants it can then connect to the submodule and clone 
submodule to submodule.  The whole operation doesn't have to be done in the 
supermodule though.

> The easiest way to implement the common object traversal is to have all
> objects in one object repository.

That's true; but is it the right way?  I really really think the submodule 
objects should be in the submodule itself.

> It may be possible to use two object stores and still do the common
> object traversal but I do not think that gives you any benefits.

There is one benefit - you can git-clone the submodule just as you would if it 
were not a submodule.  In fact, from the submodule's point of view it knows 
nothing about the supermodule.

> You still don't have a totally separated repository then, because
> you can't do a reachability analysis in the submodule repository alone.

I'm going to guess by reachability analysis, you mean that the submodule 
doesn't know that some of it's commits are referenced by the supermodule.  As 
I suggested elsewhere in the thread, that's easily fixed by making a 
refs/supermodule/commitXXXX file for each supermodule commit that references 
as particular submodule commit.  Then you can git-prune, git-fsck whenever 
you want.

Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 21:04                                                                         ` Andy Parkins
@ 2006-12-01 21:37                                                                           ` Martin Waitz
  2006-12-01 21:54                                                                             ` Andy Parkins
  2006-12-02 13:14                                                                           ` Jakub Narebski
  1 sibling, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 21:37 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1002 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 09:04:37PM +0000, Andy Parkins wrote:
> > It may be possible to use two object stores and still do the common
> > object traversal but I do not think that gives you any benefits.
> 
> There is one benefit - you can git-clone the submodule just as you
> would if it were not a submodule.  In fact, from the submodule's point
> of view it knows nothing about the supermodule.

The submodule repository obviously has to able to reach all its objects.
This is easily doable with the shared object database.

So you can already clone the submodule standalone.

> I'm going to guess by reachability analysis, you mean that the
> submodule doesn't know that some of it's commits are referenced by the
> supermodule.  As I suggested elsewhere in the thread, that's easily
> fixed by making a refs/supermodule/commitXXXX file for each
> supermodule commit that references as particular submodule commit.

I wouldn't call this "easily".

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 21:37                                                                           ` Martin Waitz
@ 2006-12-01 21:54                                                                             ` Andy Parkins
  2006-12-01 22:08                                                                               ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-01 21:54 UTC (permalink / raw)
  To: git

On Friday 2006, December 01 21:37, Martin Waitz wrote:

> > I'm going to guess by reachability analysis, you mean that the
> > submodule doesn't know that some of it's commits are referenced by the
> > supermodule.  As I suggested elsewhere in the thread, that's easily
> > fixed by making a refs/supermodule/commitXXXX file for each
> > supermodule commit that references as particular submodule commit.
>
> I wouldn't call this "easily".

Of course it is; when you write a supermodule commit you have it's hash, 
$SUPERMODULE_HASH, you have the commit-hash of the submodule commit you're 
referencing, $SUBMODULE_HASH.  It's not really hard to do

echo $SUBMODULE_HASH > 
submodule/.git/refs/supermodules/commit$SUPERMODULE_HASH

Is it?


Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 20:13                                                                         ` Linus Torvalds
  2006-12-01 20:30                                                                           ` Martin Waitz
@ 2006-12-01 22:06                                                                           ` Josef Weidendorfer
  2006-12-01 22:12                                                                             ` Martin Waitz
  2006-12-01 22:26                                                                             ` Linus Torvalds
  2006-12-01 22:35                                                                           ` sf
  2006-12-08 18:29                                                                           ` Jon Loeliger
  3 siblings, 2 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-01 22:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

On Friday 01 December 2006 21:13, Linus Torvalds wrote: 
> > There seems to be some agreement to store the commit id of
> > the submodule instead of a plain tree id in the supermodules tree object, and
> > that all objects that are reachable from this commit are made part of the
> > supermodule repository (either fetched or via alternates). Do you agree?
> 
> Well, I would actually argue that you may often want to have a supermodule 
> and then at least have the _option_ to decide to not fetch all the 
> submodules.

If you want to allow this, you have to be able to cut off fetching the
objects of the supermodule at borders to given submodules, the ones you
do not want to track. With "border" I mean the submodule commit in some
tree of the supermodule.

This looks a little bit like a shallow clone, where you introduce
graft points at the border to some of the submodule's object DAGs.
But I am not sure that this is scalable: for supermodules with
a large number of submodules you are not interested in,
your graft file would grow very fast, as there will be new borders
with every change in some submodule, which happens to be tracked
in the supermodule.

So IMHO, instead of a huge graft file, you want to have a fast way
to check at a submodule border which submodule this given border is
going into. Then, at fetch time, you easily can decide that you do
not want to fetch any object from the submodule.
Otherwise, you would have to ask the remote end at cloning time:
"Is this commit from some submodule I am locally not interested in?"

So I think we should introduce a submodule namespaces in supermodules.
And at every border from super- to submodules, the name of the
submodule we are going into should be specified.
Which actually means that we need to introduce a "submodule" object,
and trees of a supermodule can have such submodule objects as borders
into a submodule. In a submodule object, of course we have the
SHA1 of the commit into the submodule DAG, and there would be the global
unique name we have choosen for this submodule in this supermodule.
Something like

 submodule: gcc
 commit: 6287376...

Before cloning a supermodule, you should be able to list the names of
the submodules available, and select the submodules you want to have
cloned together with the supermodule.

> Ie, you very much should be able to to do
>
>         git clone supermodule/that/one/submodule
> my-own-version-of-submodule
>
> to clone a submodule _without_ getting anything else (but still get all
> the work you did within he submodule - very much including your own
> private branch work).

So in the example, "that/one/submodule" is _not_ the path of the working
tree which happens to be the root of the submodule at current supermodule
HEAD, but the unique name from the submodule namespace.

This is important, as you should be able to move the root of a submodule
inside of your supermodule like moving any other file or directory.
I.e. for every supermodule commit, the path to the root directory of a
given submodule can change, making it useless as a name for a submodule
selection at clone time.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 21:54                                                                             ` Andy Parkins
@ 2006-12-01 22:08                                                                               ` Martin Waitz
  2006-12-02 10:04                                                                                 ` Andy Parkins
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 22:08 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 09:54:32PM +0000, Andy Parkins wrote:
> On Friday 2006, December 01 21:37, Martin Waitz wrote:
> 
> > > I'm going to guess by reachability analysis, you mean that the
> > > submodule doesn't know that some of it's commits are referenced by the
> > > supermodule.  As I suggested elsewhere in the thread, that's easily
> > > fixed by making a refs/supermodule/commitXXXX file for each
> > > supermodule commit that references as particular submodule commit.
> >
> > I wouldn't call this "easily".
> 
> Of course it is; when you write a supermodule commit you have it's hash, 
> $SUPERMODULE_HASH, you have the commit-hash of the submodule commit you're 
> referencing, $SUBMODULE_HASH.  It's not really hard to do
> 
> echo $SUBMODULE_HASH > 
> submodule/.git/refs/supermodules/commit$SUPERMODULE_HASH

I guess you are aware that you have to scan _all_ trees inside _all_
supermodule commits for possible references.

So what do you do with deleted submodules?
You wouldn't want them to still sit around in your working directory,
but you still have to preserve them.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:06                                                                           ` Josef Weidendorfer
@ 2006-12-01 22:12                                                                             ` Martin Waitz
  2006-12-01 22:26                                                                               ` Josef Weidendorfer
  2006-12-01 23:17                                                                               ` Josef Weidendorfer
  2006-12-01 22:26                                                                             ` Linus Torvalds
  1 sibling, 2 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 22:12 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Linus Torvalds, sf, git

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 11:06:40PM +0100, Josef Weidendorfer wrote:
> > Well, I would actually argue that you may often want to have a
> > supermodule and then at least have the _option_ to decide to not
> > fetch all the submodules.
> 
> If you want to allow this, you have to be able to cut off fetching the
> objects of the supermodule at borders to given submodules, the ones you
> do not want to track. With "border" I mean the submodule commit in some
> tree of the supermodule.

I don't think this is something special to submodules.  There has been
interest in checking out only a part of the tree even before talking
about submodules and I really think this feature should be independent
to submodules.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:12                                                                             ` Martin Waitz
@ 2006-12-01 22:26                                                                               ` Josef Weidendorfer
  2006-12-01 22:40                                                                                 ` Martin Waitz
  2006-12-01 23:17                                                                               ` Josef Weidendorfer
  1 sibling, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-01 22:26 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Linus Torvalds, sf, git

On Friday 01 December 2006 23:12, Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 11:06:40PM +0100, Josef Weidendorfer wrote:
> > > Well, I would actually argue that you may often want to have a
> > > supermodule and then at least have the _option_ to decide to not
> > > fetch all the submodules.
> > 
> > If you want to allow this, you have to be able to cut off fetching the
> > objects of the supermodule at borders to given submodules, the ones you
> > do not want to track. With "border" I mean the submodule commit in some
> > tree of the supermodule.
> 
> I don't think this is something special to submodules.  There has been
> interest in checking out only a part of the tree even before talking
> about submodules and I really think this feature should be independent
> to submodules.

It's not about checking out part of the tree, it's about fetching only
part of the objects: If you have a slow modem and want to clone a
supermodule, you are not interested in fetching all the objects from
some submodules.

So it is more like a shallow clone. But even here, submodules are special
as you have defined borders between supermodule and submodules. This gives
you the freedom the introduce a submodule namespace, and allows you to
point to a submodule: "I do not want you!".

With shallow clone, there you do not have this option, so there, you need
to use something like grafting.

BTW: In your submodule implementation, is the user allowed to change the
relative path of the root of some submodule, e.g. with "git-mv" ?

Josef

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:06                                                                           ` Josef Weidendorfer
  2006-12-01 22:12                                                                             ` Martin Waitz
@ 2006-12-01 22:26                                                                             ` Linus Torvalds
  2006-12-01 22:41                                                                               ` sf
  2006-12-01 22:55                                                                               ` Josef Weidendorfer
  1 sibling, 2 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-01 22:26 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: sf, git, Martin Waitz

On Fri, 1 Dec 2006, Josef Weidendorfer wrote:
> > 
> > Well, I would actually argue that you may often want to have a supermodule 
> > and then at least have the _option_ to decide to not fetch all the 
> > submodules.
> 
> If you want to allow this, you have to be able to cut off fetching the
> objects of the supermodule at borders to given submodules, the ones you
> do not want to track. With "border" I mean the submodule commit in some
> tree of the supermodule.
>
> This looks a little bit like a shallow clone

No. 

I would say that it looks more like a "partial checkout" than a shallow 
clone.

A shallow clone limits the data in "time" - we have _some_ data, but we 
don't have all of the history of that data.

In contrast, a submodule that we don't fetch is an all-or-nothing 
situation: we simply don't have the data at all, and it's really a matter 
of simply not recursing into that submodule at all - much more like not 
checking out a particular part of the tree.

So if a shallow clone is a "limit in time", a lack of a module (or a lack 
of a checkout for a subtree in general - you could certainly imagine doing 
the same thing even _within_ a git repository, and indeed, we did discuss 
exactly that at one point in time) is more of a "limit in space".

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 20:13                                                                         ` Linus Torvalds
  2006-12-01 20:30                                                                           ` Martin Waitz
  2006-12-01 22:06                                                                           ` Josef Weidendorfer
@ 2006-12-01 22:35                                                                           ` sf
  2006-12-08 18:29                                                                           ` Jon Loeliger
  3 siblings, 0 replies; 252+ messages in thread
From: sf @ 2006-12-01 22:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Martin Waitz

Linus Torvalds wrote:
> 
> On Fri, 1 Dec 2006, sf wrote:
>> Linus Torvalds wrote:
>> ...
>>> Think of it this way: one common use for submodules is really to just
>>> (occasionally) track somebody elses code. The submodule should be a
>>> totally pristine copy from somebody else (ie it might be the "intel driver
>>> for X.org" submodule, maintained within intel), and the supermodule just
>>> refers to it indirectly (ie the supermodule might be the "Fedora Core X
>>> group" which contains all the different drivers from different people).
>> Could you please be a little bit more specific about how you would store the
>> "pristine copy".
> 
> Note that it's not necessarily "pristine", since the submodule clearly is 
> a local git repository in its own right. So like _any_ git repository, you 
> can (and may well end up) having your own local branches in the submodule, 
> with your own local modifications.
> 
> So I'm not claiming that a submodule must always match some external git 
> tree 100%, and that it must be read-only or anything like that. I'm just 
> saying that I suspect that quite often, one of the MOST IMPORTANT parts is 
> that the submodule is really something that somebody else technically 
> maintains, and that this is actually one of the _reasons_ why it is a 
> submodule in the first place. 
> 
> For example, a lot of projects end up having some kind of "library 
> component" as a submodule. Take something like a video player project, 
> which would have something like ffmpeg as a submodule, not because you'd 
> maintain ffmpeg yourself, but simply because (let's say) the library 
> interface changes enough, or you need a specific version with some of your 
> own fixes that haven't been released widely yet, so you want to carry all 
> the libraries you need _with_ you, even though you don't really maintain 
> that submodule. You at most have some small extensions of your own.
> 
> Now, in this situation, it's relaly really _important_ that the submodule 
> really is totally independent of the supermodule, for several reasons.
> 
> For example, since you don't "really" own that project, carrying around 
> your own fixes is really really painful. We know it happens all the time, 
> and a lot of projects end up needing their own version, but the _last_ 
> thing you want is to be in merge hell all the time. So as a supermodule 
> maintainer, the best possible thing for you is to be able to push back 
> those local changes to the original project maintainer, so that you 
> _don't_ have to maintain your own changes.

True. But if you need the changes to the submodule for your supermodule
to function, and upstream either does not want to merge your changes or
the merge will be available only after a long time, then what is the
alternative? You must be able to keep local changes, and you must be
able to keep pulling from upstream. Of course, what you describe is the
ideal case: You find a bug, push the fix upstream, and in no time at all
your fix is merged and you can just pull a new version into your
superproject, but that might be wishful thinking.

> But you need to realize that the real maintainer of the submodule is 
> TOTALLY UNINTERESTED in your supermodule. He's not going to maintain it, 
> and in fact, if you have anything in the submodule that ends up talking 
> about your supermodule, that's just going to make it a lot less likely 
> that the upstream maintainer will ever pull your changes. He might take a 
> diff from you, but in a perfect world, you'd actually be able to tell him: 
> 
>  "Hey, I've got a git repository with a few fixes to your ffmpeg git tree, 
>   please pull from git://myhost.com/submodule.git to get these fixes:
> 
> 	... explanation of fixes and commits that are relevant to
> 	ffmpeg, and have nothing to do with the supermodule, except
> 	that you need those bug-fixes because you _use_ ffmpeg ...
> 
>   Thanks"
> 
> See?

No! All you need is a naming scheme to address the commit of the
subproject that should be pulled. The extreme case would be to just
address it with its id (well, currently you cannot do that with git
pull, but that is fixable). But I already proposed a syntax for naming
commits which are "hidden" in a superproject: Just name the path as
described in git-rev-parse and append double slashes (to indicate that
you mean the commit, not the tree it contains). So no manual work needs
be done by upstream.

[snipped: about independence of submodule branches]

>> There seems to be some agreement to store the commit id of
>> the submodule instead of a plain tree id in the supermodules tree object, and
>> that all objects that are reachable from this commit are made part of the
>> supermodule repository (either fetched or via alternates). Do you agree?
> 
> Well, I would actually argue that you may often want to have a supermodule 
> and then at least have the _option_ to decide to not fetch all the 
> submodules.

[transmeta example snipped]

> So no, I don't think the supermodule should even _force_ people to always 
> get all the submodules. It migth be the default case, but at the same 
> time, it's just being polite to let users decide on their own whether they 
> really want _all_ of the build infrastructure sources.

If you want to track some chosen submodules there are two easy solutions:

1. If you want to track their state as it appears from the supermodule's
view, pull from master:<submodule>//
2. If you want to track their state from their own development branches,
 pull from <submodule>/master

Can you see the difference?

>> If I understand you correctly you cannot make any changes to the submodules
>> code _in the supermodule's repository_, no bugfixes, no extensions, no
>> adaptions, nothing. Do you mean that?
> 
> Yes. I think you should make all changes _within_ the submodule, because 
> the submodule should still be an independent git tree in its own right.

Every commit is a git tree in its own right, is it not?

[description of independent submodule development snipped]

> And the importance of keeping the submodule independent is partly just 
> stability and sanity, but partly also scalability. For example, the 
> "index" in a supermodule should NOT include the indexes of all the 
> submodules. That's really important, because the index doesn't really 
> scale. Things do slow down with large indexes. 
> 
> For example, git can handle tens of thousands of files easily. I suspect 
> it scales well to hundreds of thousands of filenames. But with 
> supermodules, you really can end up in the situation where you have _tens_ 
> of these submodules, maybe even hundreds. And if you try to maintain one 
> unified index for the _whole_ thing, I guarantee you that you'll start 
> feeling the pain. Indexing millions of files is just not going to be 
> pretty.

I am not sure I understand what you say.

1. If you are working on a submodule, then the supermodule never enters
the picture. You are working independently. So far, so good.

2. If you are working on the supermodule, git will not be able to
function? How would you work without submodules, in which case you would
 have simply one large project?

> So just from a git stability and scalability point, it's important to keep 
> subprojects _separate_. There is obviously integration stuff, but they 
> should still be seen as truly independent projects. Even the supermodule 
> should have clearly its own life even _regardless_ of submodules, because 
> (as I said) quite often you may want the supermodule, but you don't want 
> to have _all_ of the submodules.
> 
> But it's more than that stability and scalability thing too - keeping them 
> separate is what allows you to do pulls and pushes on an individual 
> subproject basis, and have people really work at that level. For example, 
> if you're the compiler guy at a company, you really do want to work with 
> other compiler people _outside_ the company, but you sure as hell may not 
> be able to give them access to your supermodule. But you may want to work 
> on _just_ the compiler parts (or at least share some branches in public), 
> which means that the subproject really has to be able to work 
> _independently_ of the supermodule.

I totally agree. When I try to explain why submodules work that only
exist as part of one or more supermodules, I do not mean to say that you
cannot or should not have independent branches or repositories for the
submodules' code.

> So "independent" here is really key, for several reasons. And that all 
> means, for example, that here must NEVER be any "backpointers". A 
> subproject really can _never_ have backpointers to the superproject, 
> because that fundamentally means that the above kind of "compiler guy 
> works on the compiler subproject in public" cannot work, if your 
> supermodule isn't public.

I took that for granted: from a commit you only ever look backwards (in
time/history dimension) or downwards (in content dimension).

Regards


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:26                                                                               ` Josef Weidendorfer
@ 2006-12-01 22:40                                                                                 ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 22:40 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Linus Torvalds, sf, git

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 11:26:22PM +0100, Josef Weidendorfer wrote:
> It's not about checking out part of the tree, it's about fetching only
> part of the objects: If you have a slow modem and want to clone a
> supermodule, you are not interested in fetching all the objects from
> some submodules.

So when you want to suppress one submodule, how is this not about only
checking out part of the tree?
Ok, you also want to avoid downloading the submodule, but you first have
to solve the partial checkout.

> BTW: In your submodule implementation, is the user allowed to change the
> relative path of the root of some submodule, e.g. with "git-mv" ?

In principle: yes.
However there are some links between both repositories that have to be
updated manually (for the shared object repository and for ignoring
submodule files in the supermodule).
But I expect that much of this configuration stuff will vanish when
submodules are better integrated in git.

Rename detection for submodules would be another interesting thing to
have. It should be much easier as for files because we can simply check
for common ancestors and do not have to guess based on the diff.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:26                                                                             ` Linus Torvalds
@ 2006-12-01 22:41                                                                               ` sf
  2006-12-01 23:03                                                                                 ` Josef Weidendorfer
  2006-12-01 23:09                                                                                 ` Linus Torvalds
  2006-12-01 22:55                                                                               ` Josef Weidendorfer
  1 sibling, 2 replies; 252+ messages in thread
From: sf @ 2006-12-01 22:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

Linus Torvalds wrote:
...
> In contrast, a submodule that we don't fetch is an all-or-nothing 
> situation: we simply don't have the data at all, and it's really a matter 
> of simply not recursing into that submodule at all - much more like not 
> checking out a particular part of the tree.

If you do not want to fetch all of the supermodule then do not fetch the
supermodule. Instead fetch only the submodules you are interested in.
You do not have to fetch the whole repository.

Regards


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:26                                                                             ` Linus Torvalds
  2006-12-01 22:41                                                                               ` sf
@ 2006-12-01 22:55                                                                               ` Josef Weidendorfer
  2006-12-01 23:07                                                                                 ` Martin Waitz
  2006-12-01 23:30                                                                                 ` Linus Torvalds
  1 sibling, 2 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-01 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

On Friday 01 December 2006 23:26, Linus Torvalds wrote:
> 
> On Fri, 1 Dec 2006, Josef Weidendorfer wrote:
> > > 
> > > Well, I would actually argue that you may often want to have a supermodule 
> > > and then at least have the _option_ to decide to not fetch all the 
> > > submodules.
> > 
> > If you want to allow this, you have to be able to cut off fetching the
> > objects of the supermodule at borders to given submodules, the ones you
> > do not want to track. With "border" I mean the submodule commit in some
> > tree of the supermodule.
> >
> > This looks a little bit like a shallow clone
> 
> No. 
> 
> I would say that it looks more like a "partial checkout" than a shallow 
> clone.
> 
> A shallow clone limits the data in "time" - we have _some_ data, but we 
> don't have all of the history of that data.
> 
> In contrast, a submodule that we don't fetch is an all-or-nothing 
> situation: we simply don't have the data at all, and it's really a matter 
> of simply not recursing into that submodule at all - much more like not 
> checking out a particular part of the tree.

OK.

I still think it should be about "limit in space" regarding the
objects in the local repository.

For a project containing "gcc" as submodule, and I am not
interested in this submodule, there should be a way to not need
to fetch all the objects from the gcc submodule at clone time.


What about my other argument for a submodule namespace:
You want to be able to move the relative root path of a submodule
inside of your supermodule, but yet want to have a unique name
for the submodule:
- to be able to just clone a submodule without having to know
the current position in HEAD
- more practically, e.g. to be able to name a submodule
independent from any current commit you are on in the supermodule,
e.g. to be able to store some meta information about a submodule:
- "Where is the official upstream of this submodule?"
- "Should git allow to commit rewind actions of this submodule
   in the supermodule?" (which, AFAICS, exactly has the same
   problems as publishing a rewound branch: you will get into
   merge hell when you want to pull upstream changes into the
   supermodule)
- "Should this submodule be checked out?"
and so on.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:41                                                                               ` sf
@ 2006-12-01 23:03                                                                                 ` Josef Weidendorfer
  2006-12-01 23:09                                                                                 ` Linus Torvalds
  1 sibling, 0 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-01 23:03 UTC (permalink / raw)
  To: sf-gmane; +Cc: Linus Torvalds, sf, git, Martin Waitz

On Friday 01 December 2006 23:41, sf wrote:
> Linus Torvalds wrote:
> ...
> > In contrast, a submodule that we don't fetch is an all-or-nothing 
> > situation: we simply don't have the data at all, and it's really a matter 
> > of simply not recursing into that submodule at all - much more like not 
> > checking out a particular part of the tree.
> 
> If you do not want to fetch all of the supermodule then do not fetch the
> supermodule. Instead fetch only the submodules you are interested in.
> You do not have to fetch the whole repository.

But what, when I *want* to fetch the supermodule because of the source
tree which is only available in the supermodule?

Of course, you can argue that the only objects in trees of a supermodule
should be submodule commits, but this is quite restricting the usage of
supermodules.

See further arguments for a submodule namespace in my other mail.
You probably want to specify policies about submodule handling. This
information has to be indexed by some name independent from a 
supermodule commit.

Josef

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:55                                                                               ` Josef Weidendorfer
@ 2006-12-01 23:07                                                                                 ` Martin Waitz
  2006-12-01 23:30                                                                                 ` Linus Torvalds
  1 sibling, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-01 23:07 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Linus Torvalds, sf, git

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 11:55:03PM +0100, Josef Weidendorfer wrote:
> What about my other argument for a submodule namespace:
> You want to be able to move the relative root path of a submodule
> inside of your supermodule, but yet want to have a unique name
> for the submodule:
> - to be able to just clone a submodule without having to know
> the current position in HEAD
> - more practically, e.g. to be able to name a submodule
> independent from any current commit you are on in the supermodule,
> e.g. to be able to store some meta information about a submodule:
> - "Where is the official upstream of this submodule?"

you can always have a bare repository for all used modules lying around
in some defined location.  There is no need for a unique submodule-name.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:41                                                                               ` sf
  2006-12-01 23:03                                                                                 ` Josef Weidendorfer
@ 2006-12-01 23:09                                                                                 ` Linus Torvalds
  2006-12-01 23:36                                                                                   ` Josef Weidendorfer
                                                                                                     ` (2 more replies)
  1 sibling, 3 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-01 23:09 UTC (permalink / raw)
  To: sf; +Cc: sf, git, Martin Waitz



On Fri, 1 Dec 2006, sf wrote:
> Linus Torvalds wrote:
> ...
> > In contrast, a submodule that we don't fetch is an all-or-nothing 
> > situation: we simply don't have the data at all, and it's really a matter 
> > of simply not recursing into that submodule at all - much more like not 
> > checking out a particular part of the tree.
> 
> If you do not want to fetch all of the supermodule then do not fetch the
> supermodule.

So why do you want to limit it? There's absolutely no cost to saying "I 
want to see all the common shared infrastructure, but I'm actually only 
interested in this one submodule that I work with".

Also, anybody who works on just the build infrastructure simply may not 
care about all the submodules. The submodules may add up to hundreds of 
gigs of stuff. Not everybody wants them. But you may still want to get the 
common build infrastructure.

In other words, your "all or nothing" approach is
 (a) not friendly
and
 (b) has no real advantages anyway, since modules have to be independent 
     enough that you _can_ split them off for other reasons anyway.

So forcing that "you have to take everything" mentality onyl has 
negatives, and no positives. Why do it?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:12                                                                             ` Martin Waitz
  2006-12-01 22:26                                                                               ` Josef Weidendorfer
@ 2006-12-01 23:17                                                                               ` Josef Weidendorfer
  2006-12-02 20:24                                                                                 ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-01 23:17 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Linus Torvalds, sf, git

On Friday 01 December 2006 23:12, Martin Waitz wrote:
> hoi :)
> 
> On Fri, Dec 01, 2006 at 11:06:40PM +0100, Josef Weidendorfer wrote:
> > > Well, I would actually argue that you may often want to have a
> > > supermodule and then at least have the _option_ to decide to not
> > > fetch all the submodules.
> > 
> > If you want to allow this, you have to be able to cut off fetching the
> > objects of the supermodule at borders to given submodules, the ones you
> > do not want to track. With "border" I mean the submodule commit in some
> > tree of the supermodule.
> 
> I don't think this is something special to submodules.  There has been
> interest in checking out only a part of the tree even before talking
> about submodules and I really think this feature should be independent
> to submodules.

After some thinking, a submodule namespace even is important for checking
out only parts of a supermodule, exactly because the root of a submodule
potentially can change at every commit.

When checking out some arbitrary supermodule commit, how do you check that
at some submodule border, the user did not want to check out the submodule
at all? You need a way to check the DAG identity you are diving
into at this border: lets say by going to the root commit of this DAG (!).
And via this identity, you have to check whether the user had
specified that he wants the submodule to be check out.
Without any further meta information (indexed by a submodule name!), this
information is only available from the checkout the user switched from,
as there would be no file in the working tree from this submodule?

Quite a pain.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 20:30                                                                           ` Martin Waitz
@ 2006-12-01 23:23                                                                             ` Alan Chandler
  0 siblings, 0 replies; 252+ messages in thread
From: Alan Chandler @ 2006-12-01 23:23 UTC (permalink / raw)
  To: git

On Friday 01 December 2006 20:30, Martin Waitz wrote:
> hoi :)
>
> Linus, you are a lot better in describing all my thoughts than I myself.
> ;-)

Yes

Some of the clearest explanations described from a strategic point of view 
come from posts such as these. 

And he keeps saying he can't write documentation.

 
-- 
Alan Chandler

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:55                                                                               ` Josef Weidendorfer
  2006-12-01 23:07                                                                                 ` Martin Waitz
@ 2006-12-01 23:30                                                                                 ` Linus Torvalds
  2006-12-02  0:14                                                                                   ` Josef Weidendorfer
  1 sibling, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-01 23:30 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: sf, git, Martin Waitz

On Fri, 1 Dec 2006, Josef Weidendorfer wrote:
> 
> What about my other argument for a submodule namespace:
> You want to be able to move the relative root path of a submodule
> inside of your supermodule, but yet want to have a unique name
> for the submodule:
> - to be able to just clone a submodule without having to know
> the current position in HEAD

Umm? I don't get the issue. A submodule is a git repo in its own right, 
and you clone it exactly like you'd clone any other repo. It _does_ have a 
HEAD. It has it's own branches. It has everything.

So when you clone a submodule, you always get all those branches. The 
supermodule will not _point_ to them all (the branches are local to the 
submodule, and _will_ depend on things like "which upstreams module am I 
tracking"), but they'll have to be there, exactly _because_ the submodule 
has an existence and is tracked on its own.

In the trivial case where the submodule doesn't even _have_ any external 
existence at all (ie it's always maintained as _just_ a submodule, it 
would probably tend to have just one branch, and a clone would get 
whatever that branch is), but that's just a degenerate special case of the 
much richer "this submodule actually has a life of its own" case.

> - more practically, e.g. to be able to name a submodule
> independent from any current commit you are on in the supermodule,
> e.g. to be able to store some meta information about a submodule:

The current commit within the supermodule would be _totally_ invisible to 
the submodule.

Of course, if HEAD _differs_ from that commit within the supermodule, then 
a "git diff" (when done from within the supermodule) should show that, but 
again, that's actually only as seen from the _supermodule_. 

> - "Where is the official upstream of this submodule?"

That's entirely a question for the submodule. You cannot ask that question 
within the confines of the supermodule, because it's not even a relevant 
question in that context. Two different supermodule repositories may well 
decide to get their submodules from difference places, just because they 
got cloned from different places (or even just for practical reasons like 
"that other site is closer to me").

So the official upstream of a submodule must NOT be encoded inside the 
supermodule (or at least not within its _objects_). Exactly because the 
upstream location is not a "global" thing - it's per-repository, and thus 
must not be encoded in the global data (ie the objects).

It should be be encoded in some _ephemeral_ place, eg in the ".git/config" 
file or in a ".git/remotes/origin"-like file (either in the supermodule or 
the submodule, and I would seriously suggest you do it within in the 
submodule itself, because you'll want it exactly when you decide to work 
on the submodule and upgrade _that_).

> - "Should git allow to commit rewind actions of this submodule
>    in the supermodule?" (which, AFAICS, exactly has the same
>    problems as publishing a rewound branch: you will get into
>    merge hell when you want to pull upstream changes into the
>    supermodule)

The only thing that a submodule must NOT be allowed to do on its own is 
pruning (and it's distant cousin "git repack -d"). You must always prune 
from the supermodule, because the submodule cannot really know on its own 
what references point into it.

(There are alternatives. One alternative is to never allow rewinding - or 
deletion - of branches in a submodule, and thus solve the problem that 
way. That is the easier solution, because it also means that a "clone" of 
a supermodule can just recursively clone the submodules independently 
_without_ having to worry about reachability, but it's really _really_ 
draconian).

> - "Should this submodule be checked out?"

This, I think, requires too much configuration to say separately for every 
possible submodule, so I would suggest that the way to make that decision 
is:

 - "git clone" by default will fetch and check out all submodules (and 
   obviously they have to be described some way outside of the object 
   database, just so that you don't have to parse the _whole_ history of 
   the _whole_ supermodule just to find all possible submodules. So the 
   supermodule _will_ need some "list of submodules and where to get them" 
   in a config file or other).

 - add a flag (possibly just re-use the current "-n" flag) that disables 
   that recursive fetching of submodules entirely.

 - have a way to fetch individual submodules one-by-one (that capacity 
   obviously has to be there anyway, since the "recursive" git clone has 
   to be able to do it, so this is likely just "git clone" again, with 
   just logic added to say "when you clone something and are _already_ 
   within a superproject, the clonee becomes a subproject automatically"

I dunno. And I'd also like to point out that things don't have to all work 
fully before we can do at least some cases of this. For example, if the 
initial version just always clones everything, big deal. I'm not saying 
that we have to have support for things like this on "Day 1", I'm just 
saying that I think people will want to be able to not fetch and check out 
everything, so the design should _allow_ for it.

(But I also think that as long as submodules are independent enough, the 
"design" part should fall out on its own, and it just becomes a "small 
matter of programming" to actually get it to work).

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 18:48                                                                     ` Martin Waitz
@ 2006-12-01 23:34                                                                       ` sf
  2006-12-02 19:46                                                                         ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: sf @ 2006-12-01 23:34 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Martin Waitz wrote:
> On Fri, Dec 01, 2006 at 06:33:12PM +0100, Stephan Feder wrote:
>> We are in agreement about two fundamental parts of the implementation 
>> and their meaning:
>>
>> 1. A submodule is stored as a commit id in a tree object.
>>
>> 2. Every object that is reachable from the submodule's commit are 
>> reachable from the supermodule's repository.
> 
> Correct.

Good. For me that is the main point. As I said before the user interface
is not so important because it can be changed anytime, but to change the
object database later is close to impossible.

...
> Let's see if I understand you correctly:
> 
> You don't want to create an additional .git directory for the submodule
> and just handle everything with one toplevel .git repository for the
> whole project.

Yes.

> Without the .git directory, you of course do not have refs/heads inside
> the submodule.

Correct..

> So this is a different user-interface approach to submodules when
> compared to my approach.  But the basis is the same and both could
> inter-operate.

Big YES.

> Now your submodule is no longer seen as an independent git repository
> and I think this would cause problems when you want to push/pull between
> the submodule and its upstream repository.

You can always pick a single commit or several commits out of a larger
repository and have a complete git repository.

And I already explained how to push and pull even from within superprojects.

> No technical problems, but UI-problems because now your submodule is
> handled completly different to a "normal" repository.

Yes and no. You can always have branches that are only concerned with
submodules' code, say, in refs/heads/submodules/<submodule>/.
"submodules" here is simply an example and has not deeper meaning. You
could call it foo or whatever you like. Or you could use
refs/heads/<submodule>/ if it suits you.

But if you mean the submodule as seen from the supermodule, then there
is a difference. Naturally, because the concept of submodules is new to git.

>>> Yes, but if it would have been integrated as a submodule it obviously
>>> would have been committed to the xdiff submodule inside the git
>>> repository.
>> No. The xdiff submodule would only exist as part of the git repository. 
> 
> But you could still call the "xdiff" part of the git repository a
> submodule.  And then changes to the xdiff directory result in a new
> submodule commit, even when there is no direct reference to it.
> So you'd still "commit to the xdiff submodule".

Let's make certain that we understand each other. I see a clear
distinction between the submodule code in a supermodule branch (commits
in the supermodule's tree and nothing else) and submodule branches which
are independent of the superproject. Supermodule branches and submodule
branches do not interact, only if I want them to.

> 
>> You could, f.e., access the xdiff commit in git HEAD as HEAD:xdiff// 
>> (again my proposed syntax). HEAD:xdiff//~2:xemit.c would give you the 
>> grandparent of xemit.c in the xdiff submodule.
> 
> git-cat-file commit HEAD:xdiff already works out of the box (even
> cat-file tree to get the submodule tree).  But up to now revision
> parsing follows the file name only once.
> 
> What about just separating things with "/"?
> 
> commit HEAD
> tree   HEAD/
> blob   HEAD/Makefile
> commit HEAD/xdiff
> tree   HEAD/xdiff/
> blob   HEAD/xdiff~2/xemit.c
> 
> this may add some confusion when used with hierarchical branches, but
> it's still unique:
> 
> 	refs/heads/master/xdiff/xemit.c
> 
> Just use as many path components until a matching reference is found,
> then start peeling.
> Or just use / between super and submodule:
> 
> 	refs/heads/master:xdiff/xemit.c
> 
> I think this is easier to read then
> 
> 	refs/heads/master:xdiff//:xemit.c

The double slashes is the only way I can think of that clearly indicates
that I do not mean the contents named by the path, but the commit that
you find there. Once you have named a commit in that way, you can
continue to apply other revision naming suffixes, paths, and so on.

Let's try. What does git cat-file -p
master:dir/sub//^^^:sub/dir/sub//^:dir/file mean?

Explanation: Take branch master and go to path dir/sub. There you will
find a commit. Take its grand-grandparent and go to path sub/dir/sub
(the first sub is a subproject as well but we do not care). There you
will, again, find a commit. Take its parent and go to path dir/file
which happens to be a blob the contents of which you want to cat.

In reality you will never see these kinds of complex paths. Have you
ever seen something like git cat-file -p
bd2c39f58f915af532b488c5bda753314f0db603~12^{commit}^2^5~8^2~308:README ?

>> If you mean the submodule repository created by init-module I 
>> understand. But why create this "helper repository at all"?
> 
> Because it helps "normal" git operations ;-)

Let's see. I still have to try.

Regards

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:09                                                                                 ` Linus Torvalds
@ 2006-12-01 23:36                                                                                   ` Josef Weidendorfer
  2006-12-02  0:12                                                                                     ` Linus Torvalds
  2006-12-01 23:49                                                                                   ` sf
  2006-12-02 20:12                                                                                   ` Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-01 23:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

On Saturday 02 December 2006 00:09, Linus Torvalds wrote:
> 
> On Fri, 1 Dec 2006, sf wrote:
> > Linus Torvalds wrote:
> > ...
> > > In contrast, a submodule that we don't fetch is an all-or-nothing 
> > > situation: we simply don't have the data at all, and it's really a matter 
> > > of simply not recursing into that submodule at all - much more like not 
> > > checking out a particular part of the tree.
> > 
> > If you do not want to fetch all of the supermodule then do not fetch the
> > supermodule.
> 
> So why do you want to limit it? There's absolutely no cost to saying "I 
> want to see all the common shared infrastructure, but I'm actually only 
> interested in this one submodule that I work with".

So you are for a global submodule namespace in supermodule repositories,
do I understand correctly?

Otherwise, how would you specify the submodules at clone time given the
ability that submodule roots can have relative path changed arbitrarily
between commits?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:09                                                                                 ` Linus Torvalds
  2006-12-01 23:36                                                                                   ` Josef Weidendorfer
@ 2006-12-01 23:49                                                                                   ` sf
  2006-12-02 18:57                                                                                     ` Torgil Svensson
  2006-12-02 20:12                                                                                   ` Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: sf @ 2006-12-01 23:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

Linus Torvalds wrote:
> 
> On Fri, 1 Dec 2006, sf wrote:
>> Linus Torvalds wrote:
>> ...
>>> In contrast, a submodule that we don't fetch is an all-or-nothing 
>>> situation: we simply don't have the data at all, and it's really a matter 
>>> of simply not recursing into that submodule at all - much more like not 
>>> checking out a particular part of the tree.
>> If you do not want to fetch all of the supermodule then do not fetch the
>> supermodule.
> 
> So why do you want to limit it? There's absolutely no cost to saying "I 
> want to see all the common shared infrastructure, but I'm actually only 
> interested in this one submodule that I work with".

If you need a common infrastructure to be able to work with the
submodule, then the submodule is not independent of of the supermodule.
I see a contradiction in your requirements.

> Also, anybody who works on just the build infrastructure simply may not 
> care about all the submodules. The submodules may add up to hundreds of 
> gigs of stuff. Not everybody wants them. But you may still want to get the 
> common build infrastructure.

See above.

> In other words, your "all or nothing" approach is
>  (a) not friendly
> and
>  (b) has no real advantages anyway, since modules have to be independent 
>      enough that you _can_ split them off for other reasons anyway.
> 
> So forcing that "you have to take everything" mentality onyl has 
> negatives, and no positives. Why do it?

(There have been lots of use cases for shallow clones but for a long
time git did not support them).

If you can extend this partial fetch feature to the non-subproject case
I would agree with your reasoning. What makes the subprojects so special
in this regard. Do I have to turn a plain tree into a subproject to be
able to ignore it? Once you can restrict fetches to parts of the
contents you get the ability to restrict fetches to the "common
infrastructure" and selected submodules for free.

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:36                                                                                   ` Josef Weidendorfer
@ 2006-12-02  0:12                                                                                     ` Linus Torvalds
  2006-12-02  9:22                                                                                       ` Andy Parkins
                                                                                                         ` (3 more replies)
  0 siblings, 4 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02  0:12 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: sf, git, Martin Waitz

On Sat, 2 Dec 2006, Josef Weidendorfer wrote:
> 
> So you are for a global submodule namespace in supermodule repositories,
> do I understand correctly?
> 
> Otherwise, how would you specify the submodules at clone time given the
> ability that submodule roots can have relative path changed arbitrarily
> between commits?

The only _true_ namespace would be the SHA1 of the commit (and maybe allow 
a pointer to a tag too, but the namespace ends up being the same).

How to _find_ a repository that contains that SHA1 must be left to higher 
levels. After all, repositories move around, and the place you found them 
originally is not a stable name.

So within the supermodule, on a "git object" level, a submodule should 
just be named by the SHA1 that was it's HEAD when it was committed within 
the supermodule. So in the "tree object", you'd see something like the 
following when you go "git ls-tree HEAD" on the superproject:

	...
	100644 blob 08602f522183dc43787616f37cba9b8af4e3dade	xdiff-interface.c
	100644 blob 1346908bea31319aabeabdfd955e2ea9aab37456	xdiff-interface.h
	040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2	xdiff
	050000 link 0215ffb08ce99e2bb59eca114a99499a4d06e704	xyzzy

where that 050000 is the new magic type (I picked one out of my *ss: it's 
not a valid type for a file mode, so it's a godo choice, but it could be 
anythign that cannot conflict with a real file), which just specifies the 
"link" part. The SHA1 is the SHA1 of the commit, and the "xyzzy" is 
obviously just the name within the directory of the submodule.

That's all that is actually required for a lot of git commands that 
already expect all objects to be available (ie "git checkout", "git diff" 
etc).

It only gets interesting for commands that fetch new objects, ie do a 
"pull/fetch" op, and you'd need to know where/how to fetch new objects for 
the xyzzy subproject, so that's a "naming" issue. You have a few choices:

 - get all the objects directly from the subproject as if it was one big 
   project.

   I actually think this sucks. Why? Because it puts an insane load on the 
   server side, which basically needs to traverse the object list of the 
   _sum_ of all projects. An initial clone (or a really big pull, which 
   comes to the same thing) would be absolutely horrendous

So I'd strongly argue against that approach, for scalability reasons. So 
instead, you should really try to do pulls etc one git repo at a time:

 - take the "list of subprojects" from the supermodule, and pull them all 
   one by one.

   This again makes subprojects "less seamless", and makes each subproject 
   more of a separate thing, with the project list gotten from the 
   superproject and parsed separately. But it means you have none of the 
   scalability problems, since you never see things as one huge project 
   with millions of files and even more objects.

The second approach also means that you can see the "supermodule" support 
in git as less of a "plumbing" thing, and it's largely just a thin veneer 
around the core plumbing that really doesn't understand about multiple 
repositories at all (apart from the single "link" extension in the tree 
object), and it's really just scripting to get the subprojects to "look" 
like one thing, when they really are pretty much independent.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:30                                                                                 ` Linus Torvalds
@ 2006-12-02  0:14                                                                                   ` Josef Weidendorfer
  2006-12-02  0:33                                                                                     ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-02  0:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

On Saturday 02 December 2006 00:30, Linus Torvalds wrote:
> On Fri, 1 Dec 2006, Josef Weidendorfer wrote:
> > 
> > What about my other argument for a submodule namespace:
> > You want to be able to move the relative root path of a submodule
> > inside of your supermodule, but yet want to have a unique name
> > for the submodule:
> > - to be able to just clone a submodule without having to know
> > the current position in HEAD
> 
> Umm? I don't get the issue. A submodule is a git repo in its own right, 
> and you clone it exactly like you'd clone any other repo. It _does_ have a 
> HEAD. It has it's own branches. It has everything.

I just thought about the case when you want to clone a submodule directly
out of the supermodule repository, at a given realive path. And that can
be changing.

Of course, every project which happens to be submodule of some supermodule,
also can have its own repository, as it is fully independent. And then,
you of course can clone from without any knowledge of its relative position
in the supermodule.

> In the trivial case where the submodule doesn't even _have_ any external 
> existence at all (ie it's always maintained as _just_ a submodule, it 
> would probably tend to have just one branch, and a clone would get 
> whatever that branch is), but that's just a degenerate special case of the 
> much richer "this submodule actually has a life of its own" case.

Yes.

> > - more practically, e.g. to be able to name a submodule
> > independent from any current commit you are on in the supermodule,
> > e.g. to be able to store some meta information about a submodule:
> 
> The current commit within the supermodule would be _totally_ invisible to 
> the submodule.

Of course.

Yet, you need some name to store meta information of submodules
into some config file of the supermodule, like whether you want to have
it checked out (see below).

In that case, such a name for a submodule does not have to be global in
the supermodule project...

> > - "Should git allow to commit rewind actions of this submodule
> >    in the supermodule?" (which, AFAICS, exactly has the same
> >    problems as publishing a rewound branch: you will get into
> >    merge hell when you want to pull upstream changes into the
> >    supermodule)
> 
> The only thing that a submodule must NOT be allowed to do on its own is 
> pruning (and it's distant cousin "git repack -d"). You must always prune 
> >from the supermodule, because the submodule cannot really know on its own 
> what references point into it.

Yes. I just gave an example of a policy some project may want for submodule
handling.

> > - "Should this submodule be checked out?"
> 
> This, I think, requires too much configuration to say separately for every 
> possible submodule, so I would suggest that the way to make that decision 
> is:
> 
>  - "git clone" by default will fetch and check out all submodules (and 
>    obviously they have to be described some way outside of the object 
>    database, just so that you don't have to parse the _whole_ history of 
>    the _whole_ supermodule just to find all possible submodules. So the 
>    supermodule _will_ need some "list of submodules and where to get them" 
>    in a config file or other).

Exactly. And in this list, you have to specify names.

The thing I wanted to discuss is whether such names would need to be globally
unique in the project containing submodles, or not.

If yes, it IMHO makes a lot of sense to introduce "submodule objects" which contain
these submodule names, and which are used as pointers to submodule commits in
supermodule trees.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:14                                                                                   ` Josef Weidendorfer
@ 2006-12-02  0:33                                                                                     ` Linus Torvalds
  2006-12-02  9:27                                                                                       ` Andy Parkins
  2006-12-04 18:56                                                                                       ` Michael K. Edwards
  0 siblings, 2 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02  0:33 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: sf, git, Martin Waitz

On Sat, 2 Dec 2006, Josef Weidendorfer wrote:
> > 
> > The current commit within the supermodule would be _totally_ invisible to 
> > the submodule.
> 
> Of course.
> 
> Yet, you need some name to store meta information of submodules
> into some config file of the supermodule, like whether you want to have
> it checked out (see below).

Yes, you do need to have a list of submodules somewhere, and you'd need to 
maintain that separately. One of the results of having the submodules be 
independent from the supermodule is that it's not all "automatically 
integrated", and thus the supermodule does end up having to have things 
like that maintained separately. 

And yes, if you screw that up, you wouldn't be able to fetch submodules 
properly etc, even if you see the supermodule, and yes, this sounds more 
like the CVS "Entries" kind of file that is more "tacked on" than really 
deeply integrated. But I think the separation is _more_ than worth the 
fact that you can see things being separate.

In fact, I'm very much arguing for keeping things as separate as possible, 
while just integrating to the smallest possible degree (just _barely_ 
enough that you can do things like "git clone" and it will fetch multiple 
repositories and put them all in the right places, and "git diff" and 
friends will do reasonably sane things).

Keep it simple, stupid. 

> >  - "git clone" by default will fetch and check out all submodules (and 
> >    obviously they have to be described some way outside of the object 
> >    database, just so that you don't have to parse the _whole_ history of 
> >    the _whole_ supermodule just to find all possible submodules. So the 
> >    supermodule _will_ need some "list of submodules and where to get them" 
> >    in a config file or other).
> 
> Exactly. And in this list, you have to specify names.

Yes. 

> The thing I wanted to discuss is whether such names would need to be globally
> unique in the project containing submodles, or not.

My preference would be for it to be "local", just because (as I 
mentioned), with mirroring etc, it might well be that you want to fetch 
things from the _closest_ repository. That's really not a global decision, 
it's a local one.

> If yes, it IMHO makes a lot of sense to introduce "submodule objects" which contain
> these submodule names, and which are used as pointers to submodule commits in
> supermodule trees.

You could do it that way, and then it would be global. It would work, and 
in many ways it would probably be "simpler" on a supermodule level.

The advantage of a global namespace is that you can much more easily 
update it - "git fetch" will just fetch the new file(s) that describe the 
subprojects very naturally if they are all global. Putting them in a local 
.git/config file has it's advantages (see above), but it also makes it 
very hard to version them, and to update the list - it would have to 
become manual.

There are possibly combinations of the two approaches: have a "global 
namespace" that describes the canonical place to get the subprojects, but 
have some way to add local "translation" of the canonical names into 
locally preferred versions (eg you could just have a way to say "this is 
the local mirror for that global canonical place")

Maybe that would work?

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:12                                                                                     ` Linus Torvalds
@ 2006-12-02  9:22                                                                                       ` Andy Parkins
       [not found]                                                                                         ` <200612021255.59972.Josef.Weidendorfer@gmx.de>
  2006-12-02 11:32                                                                                       ` Josef Weidendorfer
                                                                                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 252+ messages in thread
From: Andy Parkins @ 2006-12-02  9:22 UTC (permalink / raw)
  To: git

On Saturday 2006, December 02 00:12, Linus Torvalds wrote:

> 	100644 blob 08602f522183dc43787616f37cba9b8af4e3dade	xdiff-interface.c
> 	100644 blob 1346908bea31319aabeabdfd955e2ea9aab37456	xdiff-interface.h
> 	040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2	xdiff
> 	050000 link 0215ffb08ce99e2bb59eca114a99499a4d06e704	xyzzy
>
> where that 050000 is the new magic type (I picked one out of my *ss: it's
> not a valid type for a file mode, so it's a godo choice, but it could be
> anythign that cannot conflict with a real file), which just specifies the
> "link" part. The SHA1 is the SHA1 of the commit, and the "xyzzy" is
> obviously just the name within the directory of the submodule.

Can I argue that the hash in that object should actually be to a real object 
in the supermodule repository rather than a link?  Then THAT object would 
contain the hash?  So in your above example:

  100644 blob 08602f522183dc43787616f37cba9b8af4e3dade	xdiff-interface.c
  100644 blob 1346908bea31319aabeabdfd955e2ea9aab37456	xdiff-interface.h
  040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2	xdiff
  050000 link a7f26495b7b7e32bf949efbd91ee32267b792cba	xyzzy

And then the local object a7f26495b7b7e32bf949efbd91ee32267b792cba would 
contain your original hash 0215ffb08ce99e2bb59eca114a99499a4d06e704.

The reason I suggest this as without out it the "link" object is the only hash 
in the tree that doesn't point to a valid object.  The contents of objects is 
entirely arbitrary so it's perfectly okay for that to contain a hash that 
won't dereference to a real object in the supermodule.

The main advantage of this is (I think) that git-prune, git-fsck, and whatever 
else relies on tree objects all being real, don't need to be modified at all.

It also gives you scope to later add fields to the "link" object if you 
wanted.

Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:33                                                                                     ` Linus Torvalds
@ 2006-12-02  9:27                                                                                       ` Andy Parkins
  2006-12-04 18:56                                                                                       ` Michael K. Edwards
  1 sibling, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-02  9:27 UTC (permalink / raw)
  To: git

On Saturday 2006, December 02 00:33, Linus Torvalds wrote:

> Yes, you do need to have a list of submodules somewhere, and you'd need to
> maintain that separately. One of the results of having the submodules be

Why?  You just recursively search for every "link" object in the supermodule.  
That tells you which submodules you need and where they should be.

During a supermodule clone, it can tell the client end to start a new clone 
with the correct path because it knows what the local path is at that moment.



Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 22:08                                                                               ` Martin Waitz
@ 2006-12-02 10:04                                                                                 ` Andy Parkins
  2006-12-02 13:50                                                                                   ` Josef Weidendorfer
  2006-12-02 20:40                                                                                   ` Martin Waitz
  0 siblings, 2 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-02 10:04 UTC (permalink / raw)
  To: git

On Friday 2006, December 01 22:08, Martin Waitz wrote:

> > echo $SUBMODULE_HASH >
> > submodule/.git/refs/supermodules/commit$SUPERMODULE_HASH
>
> I guess you are aware that you have to scan _all_ trees inside _all_
> supermodule commits for possible references.

No you don't; you do it as part of the appropriate normal operations.

 * supermodule commit - scan the current tree for "link" objects in the
   tree.  If you find one write the reference in the submodule.
 * adding a new submodule - if this is a new submodule there can't be any
   references in the supermodule already.
 * cloning a supermodule, every new commit that gets written in the 
   supermodule gets checked from "link" objects.

> So what do you do with deleted submodules?
> You wouldn't want them to still sit around in your working directory,
> but you still have to preserve them.

Now that is a tricky one.  Mind you, I think that problem exists for any 
implementation.  I haven't got a good answer for that.

Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:12                                                                                     ` Linus Torvalds
  2006-12-02  9:22                                                                                       ` Andy Parkins
@ 2006-12-02 11:32                                                                                       ` Josef Weidendorfer
  2006-12-02 19:52                                                                                         ` Linus Torvalds
  2006-12-02 20:18                                                                                       ` Martin Waitz
  2006-12-03 22:16                                                                                       ` Sven Verdoolaege
  3 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-02 11:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, git, Martin Waitz

On Saturday 02 December 2006 01:12, Linus Torvalds wrote:
> On Sat, 2 Dec 2006, Josef Weidendorfer wrote:
> > 
> > So you are for a global submodule namespace in supermodule repositories,
> > do I understand correctly?
> > 
> > Otherwise, how would you specify the submodules at clone time given the
> > ability that submodule roots can have relative path changed arbitrarily
> > between commits?
> 
> The only _true_ namespace would be the SHA1 of the commit (and maybe allow 
> a pointer to a tag too, but the namespace ends up being the same).

I am not so sure about this.
Perhaps we want the namespace to be more than the space of commit ids.

Suppose you have some superproject which uses two compiler major versions
(GCC 3 and GCC 4) as submodules because you want to have your regression
test suite run with both major versions.
So you would have a submodule at path "gcc3/" and "gcc4/" in your supermodule.
As both the gcc 3 and gcc 4 are branches from the same project, the submodule
links will go into a connected DAG (suppose GCC uses git).

Alone from the commit link, it is not easy to see to what submodule it belongs
to (at least from a practical point of view).

So it actually _is_ more information if the proposed link objects in the supermodule
contain some submodule ID they belong to. They only need be unique in the scope of
the superproject (not really globally unique).

So another argument for submodule names: Merging. Otherwise, how do you decide
to which submodule a link belongs to, especially in the scope of above example?

> How to _find_ a repository that contains that SHA1 must be left to higher 
> levels. After all, repositories move around, and the place you found them 
> originally is not a stable name.

I did not talk about a special format for these submodule IDs yet.
We could use an URL, but with such a value a user automatically associates
some semantic which can be confusing, as repository URLs can change. 

We can use some symbolic name which has some meaning in the scope of
the superproject, and is specified at submodule creation, like "gcc3" or
"gcc4". However, this is a local decision of the person which is importing
the submodule. So if two developers of the same project using supermodules
independently decide that the import of "gcc" as submodule is the right
thing, but use slightly different submodule IDs, you will get 2 different
submodules when merging.

I argue that this is even the correct thing, and they should decide about the
name before both are doing the import, or only one imports and the other
pulls.

Another option for a submodule ID could be the root commit of the submodule
commit DAG. This looks nice as such an ID really is globally unique for
projects (more or less: the first commit always contains the time stamp at
creation, and the author/commiter email address, even if the tree happens
to be the same because you start with the same dummy file).

But my example above (with the 2 different submodules from the
same GCC project) shows that this is not working. A superproject never
could create different submodules from the same (e.g. GCC) project.

So I just vote for a symbolic name choosen at submodule creation time.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 11:00                                                 ` Martin Waitz
  2006-12-01 12:09                                                   ` sf
@ 2006-12-02 12:48                                                   ` Jakub Narebski
  1 sibling, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-02 12:48 UTC (permalink / raw)
  To: git

[cut]

From this discussion I think it follows that supermodule should track HEAD
version of submodule. Perhaps the supermodule index should have sha1 of
submodule commit, so (as usual) you have to update-index in supermodule to
record changes in submodule; the difference being that you update to HEAD
version, not to working directory version. Or you can just git-commit -a
in supermodule which would take working directory version of files, and HEAD
version of submodules.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 17:33                                                                   ` Stephan Feder
  2006-12-01 18:48                                                                     ` Martin Waitz
  2006-12-01 19:17                                                                     ` Andy Parkins
@ 2006-12-02 13:08                                                                     ` Jakub Narebski
  2 siblings, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-02 13:08 UTC (permalink / raw)
  To: git

Stephan Feder wrote:

> That's it: There is no need for a separate branch or repository. If you 
> have the subproject's commit in the superproject's object database (and 
> we really have that, see 1. and 2. above), why do you _have to_ store it 
> elsewhere?

It would be much simpler to have subproject's commit in subproject object
database, and have it available in superproject's object database by the
way of alternates.

Otherwise when commiting new submodule state in supermodule you would have
to fetch all the needed objects (submodule mighe have evolved few commits
in history inbetween) into superproject's object database.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 21:04                                                                         ` Andy Parkins
  2006-12-01 21:37                                                                           ` Martin Waitz
@ 2006-12-02 13:14                                                                           ` Jakub Narebski
  1 sibling, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-02 13:14 UTC (permalink / raw)
  To: git

Andy Parkins wrote:

>> You still don't have a totally separated repository then, because
>> you can't do a reachability analysis in the submodule repository alone.
> 
> I'm going to guess by reachability analysis, you mean that the submodule 
> doesn't know that some of it's commits are referenced by the supermodule.  As 
> I suggested elsewhere in the thread, that's easily fixed by making a 
> refs/supermodule/commitXXXX file for each supermodule commit that references 
> as particular submodule commit.  Then you can git-prune, git-fsck whenever 
> you want.

I think it would be better resolve this in universal way by adding
to git repository layout the optional "borrowers" file, which would
protect against pruning objects that are referenced by repositories
which have given repository as one of the "alternates".

By the way, how to slurp all the objects from alternates into repo
object repository?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 10:04                                                                                 ` Andy Parkins
@ 2006-12-02 13:50                                                                                   ` Josef Weidendorfer
  2006-12-02 20:43                                                                                     ` Martin Waitz
  2006-12-02 20:40                                                                                   ` Martin Waitz
  1 sibling, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-02 13:50 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

On Saturday 02 December 2006 11:04, Andy Parkins wrote:
> > So what do you do with deleted submodules?
> > You wouldn't want them to still sit around in your working directory,
> > but you still have to preserve them.
> 
> Now that is a tricky one.  Mind you, I think that problem exists for any 
> implementation.  I haven't got a good answer for that.

That suggests that it is probably better to separate submodule repositories
from their checked out working trees. Why not put the GITDIRs of the submodules
in subdirectories of the supermodules GITDIR instead?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 12:37                                                                 ` Martin Waitz
@ 2006-12-02 15:16                                                                   ` Jakub Narebski
  0 siblings, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-02 15:16 UTC (permalink / raw)
  To: git

Martin Waitz wrote:

> hoi :)
> 
> On Fri, Dec 01, 2006 at 12:20:42PM +0000, Andy Parkins wrote:
>> Is there a public repository I can look at to see what you've done?
>> I'm interested in the sort of plumbing changes needed to make
>> something like this work.
> 
> link is in the mail that started this thread ;-).

And on GitWiki as well:
  http://git.or.cz/gitwiki/SubprojectSupport

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:49                                                                                   ` sf
@ 2006-12-02 18:57                                                                                     ` Torgil Svensson
  2006-12-02 19:41                                                                                       ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-02 18:57 UTC (permalink / raw)
  To: sf-gmane, Linus Torvalds, sf, git, Martin Waitz

> If you need a common infrastructure to be able to work with the
> submodule, then the submodule is not independent of of the supermodule.
> I see a contradiction in your requirements.

Here's an real-world example that doesn't contradict:

http://amarok.kde.org/wiki/Installation_HowTo#From_Anonymous_SVN

"svn co -N svn://anonsvn.kde.org/home/kde/trunk/extragear/multimedia
cd multimedia
svn co svn://anonsvn.kde.org/home/kde/branches/KDE/3.5/kde-common/admin
svn up amarok

To compile the sources (from the multimedia directory):"

and there's probably very few people that want to clone the entire KDE
multimedia sub&super-module in this case.

//Torgil


On 12/2/06, sf <sf-gmane@stephan-feder.de> wrote:
> Linus Torvalds wrote:
> >
> > On Fri, 1 Dec 2006, sf wrote:
> >> Linus Torvalds wrote:
> >> ...
> >>> In contrast, a submodule that we don't fetch is an all-or-nothing
> >>> situation: we simply don't have the data at all, and it's really a matter
> >>> of simply not recursing into that submodule at all - much more like not
> >>> checking out a particular part of the tree.
> >> If you do not want to fetch all of the supermodule then do not fetch the
> >> supermodule.
> >
> > So why do you want to limit it? There's absolutely no cost to saying "I
> > want to see all the common shared infrastructure, but I'm actually only
> > interested in this one submodule that I work with".
>
> If you need a common infrastructure to be able to work with the
> submodule, then the submodule is not independent of of the supermodule.
> I see a contradiction in your requirements.
>
> > Also, anybody who works on just the build infrastructure simply may not
> > care about all the submodules. The submodules may add up to hundreds of
> > gigs of stuff. Not everybody wants them. But you may still want to get the
> > common build infrastructure.
>
> See above.
>
> > In other words, your "all or nothing" approach is
> >  (a) not friendly
> > and
> >  (b) has no real advantages anyway, since modules have to be independent
> >      enough that you _can_ split them off for other reasons anyway.
> >
> > So forcing that "you have to take everything" mentality onyl has
> > negatives, and no positives. Why do it?
>
> (There have been lots of use cases for shallow clones but for a long
> time git did not support them).
>
> If you can extend this partial fetch feature to the non-subproject case
> I would agree with your reasoning. What makes the subprojects so special
> in this regard. Do I have to turn a plain tree into a subproject to be
> able to ignore it? Once you can restrict fetches to parts of the
> contents you get the ability to restrict fetches to the "common
> infrastructure" and selected submodules for free.
>
> Regards
>
> Stephan
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 18:57                                                                                     ` Torgil Svensson
@ 2006-12-02 19:41                                                                                       ` Linus Torvalds
  2006-12-03  9:19                                                                                         ` Torgil Svensson
                                                                                                           ` (3 more replies)
  0 siblings, 4 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02 19:41 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: sf-gmane, sf, git, Martin Waitz

On Sat, 2 Dec 2006, Torgil Svensson wrote:
> 
> Here's an real-world example that doesn't contradict:

And I'll add the note that people who do things like submodules aren't 
generally even _used_ to them being "seamless", and most of the time 
probably don't even want complete seamlessness.

As the example that Torgil points to shows, people are quite used to 
actually even naming the submodules separately, and things like having the 
"default" set of submodules not equal the "complete" set. 

In other words, I don't think people expect or want something hugely more 
complicated than the CVS/modules kind of file. 

What people _do_ want (and that CVS in general is horribly bad at, and 
this is not a module-specific issue) is to have the _versioning_ work 
well. When you check out a specific version of a module, you want any 
_linked_ modules to follow along too.

This is the same reason why CVS users use tags a lot: because even 
_within_ a single project (no modules, no nothing), it's often hard to 
re-create the exact state of a version any other way. So you tag every 
single file and do insane things like that, because CVS just isn't very 
good at guaranteeing consistency across the whole project.

The exact same thing is true about subprojects. I don't think that people 
who have used CVS subprojects a lot really mind the CVS/modules file 
itself (but hey, maybe I'm wrong - I've seen _other_ people maintain 
modules in CVS, but I've never done it myself), but they do mind the fact 
that it's hard as hell to do something as simple as "get all modules back 
to version X" without lots and lots of careful crud (ie tagging every 
singl emodule, things like that).

Now, I'm not exactly sure who wants to use git modules, so this is the 
time to ask: did you hate the CVS/modules file? Or was it something you 
set up once, and then basically forgot about? People clearly use the 
ability to mark certain modules as depending on each other, and aliases to 
say "if you ask for this module, you actually get a set of _these_ 
modules".

_I_ suspect that that isn't the problem people had, and isn't what they 
have any problems with. What CVS didn't do very well (or at all, afaik) is 
to say "I want supermodule version XYZ", and then got all the submodules 
automatically to that (reliable) state. And THAT is something I think is 
really important for submodules, and it's why I think the most important 
part isn't actually all the veneer to make "git clone" and "git pull" work 
(which is really about the CVS/modules kind of wrapper parsing), but 
actually about the supermodule "tree" object pointing to a very specific 
version, so that you get the exact same "atomic snapshotting" of multiple 
trees that you get within a single git tree.

In other words, I _suspect_ that that is really what module users are all 
about. They want the ability to specify an arbitrary collection of these 
atomic snapshots (for releases etc), and just want a way to copy and move 
those things around, and are less interested in making everything else 
very seamless (because most people are happy to do the actual 
_development_ entirely within the submodules, so the "development" part 
is actually not that important for the supermodule, the supermodule is 
mostly for aggregation and snapshots, and tying different versions of 
different submodules together).

So that's where I come from. And maybe I'm totally wrong. I'd like to hear 
what people who actually _use_ submodules think.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:34                                                                       ` sf
@ 2006-12-02 19:46                                                                         ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 19:46 UTC (permalink / raw)
  To: sf; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 12:34:31AM +0100, sf wrote:
> > Now your submodule is no longer seen as an independent git repository
> > and I think this would cause problems when you want to push/pull between
> > the submodule and its upstream repository.
> 
> You can always pick a single commit or several commits out of a larger
> repository and have a complete git repository.
> 
> And I already explained how to push and pull even from within superprojects.

Sure it you are able to make it work, but it needs more work on the UI part.
How do you handle the index? How do you allow to clone only the
submodule?

I really thought about such a setup too, but then decided that it is
much easier to work with submodules when you can really see it as a
repository of its own.

> > But you could still call the "xdiff" part of the git repository a
> > submodule.  And then changes to the xdiff directory result in a new
> > submodule commit, even when there is no direct reference to it.
> > So you'd still "commit to the xdiff submodule".
> 
> Let's make certain that we understand each other. I see a clear
> distinction between the submodule code in a supermodule branch (commits
> in the supermodule's tree and nothing else) and submodule branches which
> are independent of the superproject. Supermodule branches and submodule
> branches do not interact, only if I want them to.

Agreed.
I think the thing which caused some discussion is that I make the
current submodule commit which is used by the supermodule available in a
refs/head in the submodule.
So there is one "branch" in the submodule which corresponds to the
version used by the supermodule, but this is just for user interface.
It's most important purpose is to give this special commit a name, so
that it can be used in merges, etc.

By selecting another refs/heads "branch" in the submodule you can also
easily detach the submodule from the supermodule.
It is really important to understand that you can't branch the submodule
alone and still have it connected to the supermodule, because the
supermodule always tracks only one commit for each submodule.
So every branch that affects the project has to be done on project
(topmost supermodule) level.
But of course the submodule can have other branches which are not
tracked by the supermodule.
So by checking out refs/heads/master (as it is used in my
implementation) you can attach the submodule to the supermodule (attach
as in: bring the working directory in sync with the whole project), and
you can detach it by selecting another refs/heads (the submodule is
still part of the supermodule, but not in the state which is currently
visible in the working directory).
This may sound confusing, but it really is the only semantic for
submodule branches that makes sense.
There are fears that you may commit something that does not match your
current working directory.  Sure, but you explicitly asked for it and I
think it won't be a problem if git-status tells about this fact.

> The double slashes is the only way I can think of that clearly indicates
> that I do not mean the contents named by the path, but the commit that
> you find there. Once you have named a commit in that way, you can
> continue to apply other revision naming suffixes, paths, and so on.

With the current semantics, you can already get to the submodule commit
(just leave out your double slashes), but what is missing is simply to
apply all the modifiers again on this submodule commit.
So I think we can do without the double slashes.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 11:32                                                                                       ` Josef Weidendorfer
@ 2006-12-02 19:52                                                                                         ` Linus Torvalds
  2006-12-02 20:21                                                                                           ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02 19:52 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: sf, Git Mailing List, Martin Waitz, Andy Parkins

On Sat, 2 Dec 2006, Josef Weidendorfer wrote:
> > 
> > The only _true_ namespace would be the SHA1 of the commit (and maybe allow 
> > a pointer to a tag too, but the namespace ends up being the same).
> 
> I am not so sure about this.
> Perhaps we want the namespace to be more than the space of commit ids.

I don't think it would be wrong at all to have a "link object" type, and 
have the "link" tree entry actually point to that "link object" instead of 
pointing directly to the commit in the submodule.

And yes, that extra indirection would allow for more flexibility (the 
"link object" can contain comments about the particular version used, 
pointers to where you can get it - whether human-readable or strictly 
meant for automation - etc etc).

So I agree with Andy Parkins' comment about the link object allowing not 
only extended namespaces, but also allowing a certain amount of 
flexibility (ie there's some built-in extensibility and ability to perhaps 
add future fields if there's a new object type).

I just want the naming of the links themselves to use all the same SHA1 
hashes etc, so that you always have a very explicit, and very trustworthy 
version - and never end up in the situation that you know which repository 
you want at that position, but you don't know exactly which commit in that 
repo was supposed to be checked out with that particular version of the 
super-module.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:09                                                                                 ` Linus Torvalds
  2006-12-01 23:36                                                                                   ` Josef Weidendorfer
  2006-12-01 23:49                                                                                   ` sf
@ 2006-12-02 20:12                                                                                   ` Martin Waitz
  2 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, sf, git

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 03:09:40PM -0800, Linus Torvalds wrote:
> On Fri, 1 Dec 2006, sf wrote:
> > If you do not want to fetch all of the supermodule then do not fetch the
> > supermodule.
> 
> So why do you want to limit it? There's absolutely no cost to saying "I 
> want to see all the common shared infrastructure, but I'm actually only 
> interested in this one submodule that I work with".

An interesting way to support this "only fetch some modules" use-case is
to use several supermodules.

So you could have one supermodule which is geared towards developers and
only contains the modules they use.  Another supermodule contails all
the toolchain sources.  And then there is the supermodule used for
releases which is just a merge of all the other supermodules.

The concept is so flexible that you don't have to introduce lots of
other things as module namespaces.  Just use the tools you have in a
creative way ;-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
                   ` (3 preceding siblings ...)
  2006-11-22  5:29 ` Petr Baudis
@ 2006-12-02 20:16 ` Jakub Narebski
  2006-12-03  1:24   ` Robin Rosenberg
  4 siblings, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-12-02 20:16 UTC (permalink / raw)
  To: git

Few thoughts on this topic. Some of those are repeating what
was said eaelier

1. Submodule (subproject) as commit-in-a-tree

Let's try to paint a little diagram (attribution missing):

belonging to:
/--------- supermodule -------\    /---- submodule -------\

commit -> tree +-> blob
  |            +-> tree -> ...
  |            +-----------------> commit -> tree -> ...
  v                                  |
commit -> tree +-> ...               v
  |            +-----------------> commit -> ...
  v                        /         |
commit -> tree +-> ...    /          |
  |            +---------/           v
  |                                commit -> ...
  v                                  |
commit -> tree +-> ...               v
               +-----------------> commit

Both have their independent history, but they are linked as some
submodule versions are part of the supermodule tree.

2. Working area for project with submodules

Submodule as separate repository model
supermodule
+ .git/  <------------------------.
  + HEAD                          |
  + index                         |
  + objects/                      |
  + objects/info/alternates ---.  |
+ subdir1/                     |  |
  + sub1file                   |  ^                   
+ submodule/                   |  |
  + .git                       v  |
    + HEAD                     |  |
    + index                    |  |
    + objects/  <--------------'  |
    +[objects/info/borrowers] ----'
  + subsubdir/
    + submfile
+ file

Embedded submodule model
supermodule
+ .git/
  + HEAD
  + index
  + objects/
  +[refs/submodules/submodule/HEAD]
  +[refs/submodules/submodule/index]
+ subdir1/
  + sub1file
+ submodule/
  + subsubdir/
    + submfile
+ file

The [fictional] borrowers file is for git-prune and friends (also
git-repack with -d option) to not remove objects needed by supermodule
(when for example submodule history got rewritten). But you can do
without it, as long as you don't rewind or don't prune in
supermodule.

The problem with submodule as separate git repository is that if you
move submodule (subproject) somewhere else in the repository (or just
rename it), you have to update alternates file... and this happens not
only on move itself, but also on checkout and reset. But that can be
managed by having in alternates all possible places the submodule ends
into. I don't know if it is truly a problem.

Alternate solution would be to have submodule objects [also] in the
main (superproject) object database (for example fetched from
submodule object repository on supermodule commit with changing
submodule).

Perhaps instead of objects/info/alternates we should use
objects/info/modules, or even modules file (as top .git dir).

The problem with embedded submodule model is ensuring that changes in
submodule go to submodule (using submodule refs; at least HEAD and
submodule index). And there are troubles with treating submodule
separately, for example cloning submodule only, or fetching from
submodule only.

3. Output of git-ls-tree and git-ls-files (git-ls-index ;-) for
project with submodules.

$ git ls-tree HEAD
040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2    subdir1
140000 subm ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7    submodule
100644 blob a57a33b81ac6c9cb5ec0c833edc21bd66428d976    file

$ git ls-tree -r -t HEAD
040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2    subdir1
100644 blob 70d8b9838a7333bc5a1edb93cf0e9abdbcf146cc    subdir1/sub1file
140000 subm ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7    submodule
040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2    submodule/subsubdir
100755 blob 6579f06b05c91f00f4f45015894f2bfab1076bf6    submodule/subsubdir/submfile
100644 blob a57a33b81ac6c9cb5ec0c833edc21bd66428d976    file

$ git ls-files --stages
100644 70d8b9838a7333bc5a1edb93cf0e9abdbcf146cc 0   subdir1/sub1file
140000 ccddf1d4b0cf7fd3a699d8b33cf5bc4c5c4435b7 0   submodule
100644 a57a33b81ac6c9cb5ec0c833edc21bd66428d976 0   file

4. Workflow(s) for project with submodules

$ cd submodule
submodule$ edit subsubdir/submfile
submodule$ git update-index subsubdir/submfile  # this updates submodule index
submodule$ git commit -m "Submodule change"     # this changes submodule HEAD
submodule$ cd ..
$ git update-index submodule                    # this updates index 
                                                  to submodule HEAD version
$ git commit -m "Change in submodule"           # this updates HEAD

Of course as usual you should be able to do "git commit -a" to skip
"git update-index". One has to remember that "git update-index
submodule" and "git commit submodule" uses HEAD version of submodule,
not the working area version.

There was an idea to update superproject index not to HEAD version
but to some specified branch version.

5. Extended sha1 syntax for submodules

For [almost] all commands the commit-in-tree should
be viewed as tree-ish, for example in HEAD:submodule/subsubdir (is a
tree), or HEAD:submodule/subsubdir/submfile (is a blob).

Currently a suffix ':' followed by a path names the blob or tree (or
commit) at the given path in the tree-ish object named by the part
before the colon. You cannot currently use it recirsively, i.e. use
<tree-ish>:<path> to refer to tree (or commit), and use ':' after
that, e.g. <tree-ish>:<path>:<subpath>... well, currently this has not
much sense, as you can (and have to) use '/' as a separator.

There was proposal to use '//' as a way to force commit object in the
tree to be treated as commit-ish, not as a tree, so you can apply all
the extended sha1 machinery suitable for commits like ^, ^n, ~n and
also probably ^@, but perhaps not @{n}. Then making ':' resursive
would be useful, for example:

  HEAD^:submodule//~2:subsubdir/submfile

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:12                                                                                     ` Linus Torvalds
  2006-12-02  9:22                                                                                       ` Andy Parkins
  2006-12-02 11:32                                                                                       ` Josef Weidendorfer
@ 2006-12-02 20:18                                                                                       ` Martin Waitz
  2006-12-02 20:44                                                                                         ` Linus Torvalds
  2006-12-03 22:16                                                                                       ` Sven Verdoolaege
  3 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, git

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

hoi :)

On Fri, Dec 01, 2006 at 04:12:10PM -0800, Linus Torvalds wrote:
> It only gets interesting for commands that fetch new objects, ie do a 
> "pull/fetch" op, and you'd need to know where/how to fetch new objects for 
> the xyzzy subproject, so that's a "naming" issue. You have a few choices:
> 
>  - get all the objects directly from the subproject as if it was one big 
>    project.
> 
>    I actually think this sucks. Why? Because it puts an insane load on the 
>    server side, which basically needs to traverse the object list of the 
>    _sum_ of all projects. An initial clone (or a really big pull, which 
>    comes to the same thing) would be absolutely horrendous

I don't buy your scalability argument.
By dividing the object traversal in separate steps you do not win
anything.  The complexity of the operation still stays the same, as you
still have to traverse the exact same amount of objects.

By separating the repositories you just make reachability analyis be
totally awkward, without winning anything.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 19:52                                                                                         ` Linus Torvalds
@ 2006-12-02 20:21                                                                                           ` Martin Waitz
  2006-12-02 20:46                                                                                             ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, Git Mailing List, Andy Parkins

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 11:52:13AM -0800, Linus Torvalds wrote:
> I don't think it would be wrong at all to have a "link object" type, and 
> have the "link" tree entry actually point to that "link object" instead of 
> pointing directly to the commit in the submodule.
> 
> And yes, that extra indirection would allow for more flexibility (the 
> "link object" can contain comments about the particular version used, 
> pointers to where you can get it - whether human-readable or strictly 
> meant for automation - etc etc).

What makes a submodule so special that now we suddenly have to store
those stuff in the object database?

Storing a fetch location would grossly contradict the distributed nature
of git.  I really do not see _any_ reason to store more information than
the commit sha1 of the submodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 23:17                                                                               ` Josef Weidendorfer
@ 2006-12-02 20:24                                                                                 ` Martin Waitz
  2006-12-03  0:55                                                                                   ` Josef Weidendorfer
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:24 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Linus Torvalds, sf, git

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 12:17:44AM +0100, Josef Weidendorfer wrote:
> After some thinking, a submodule namespace even is important for checking
> out only parts of a supermodule, exactly because the root of a submodule
> potentially can change at every commit.

have you ever thought about the idea that the location may be an
important thing to consider for your decision.

Perhaps the submodule is now used for something else (this is why it was
moved) and that now you'd like to keep it?

Anyway, you can just create several supermodules or implement generic
partial tree support for git.  I do not see any reason to special case
submodules here.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 10:04                                                                                 ` Andy Parkins
  2006-12-02 13:50                                                                                   ` Josef Weidendorfer
@ 2006-12-02 20:40                                                                                   ` Martin Waitz
  1 sibling, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:40 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 10:04:20AM +0000, Andy Parkins wrote:
> On Friday 2006, December 01 22:08, Martin Waitz wrote:
> 
> > > echo $SUBMODULE_HASH >
> > > submodule/.git/refs/supermodules/commit$SUPERMODULE_HASH
> >
> > I guess you are aware that you have to scan _all_ trees inside _all_
> > supermodule commits for possible references.
> 
> No you don't; you do it as part of the appropriate normal operations.
> 
>  * supermodule commit - scan the current tree for "link" objects in the
>    tree.  If you find one write the reference in the submodule.
>  * adding a new submodule - if this is a new submodule there can't be any
>    references in the supermodule already.
>  * cloning a supermodule, every new commit that gets written in the 
>    supermodule gets checked from "link" objects.

 * removing a branch from the supermodule.
   OK, this is an infrequent operation and it can be handled by redoing
   everything.

I just don't like to duplicate information which is already available
easily.  We'd need much to many special cases, just to correctly support
reachablility analysis.
KISS.

> > So what do you do with deleted submodules?
> > You wouldn't want them to still sit around in your working directory,
> > but you still have to preserve them.
> 
> Now that is a tricky one.  Mind you, I think that problem exists for any 
> implementation.  I haven't got a good answer for that.

If you just keep it in a shared object repository you don't have any
problems.

Please note that it is not required to keep it in one physical location.
You can still use alternates/whatever to store some objects in another
repository, but you need to be able to access all objects from the
supermodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 13:50                                                                                   ` Josef Weidendorfer
@ 2006-12-02 20:43                                                                                     ` Martin Waitz
  2006-12-03  1:02                                                                                       ` Josef Weidendorfer
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:43 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Andy Parkins, git

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 02:50:45PM +0100, Josef Weidendorfer wrote:
> On Saturday 02 December 2006 11:04, Andy Parkins wrote:
> > > So what do you do with deleted submodules?
> > > You wouldn't want them to still sit around in your working directory,
> > > but you still have to preserve them.
> > 
> > Now that is a tricky one.  Mind you, I think that problem exists for any 
> > implementation.  I haven't got a good answer for that.
> 
> That suggests that it is probably better to separate submodule repositories
> from their checked out working trees. Why not put the GITDIRs of the submodules
> in subdirectories of the supermodules GITDIR instead?

Why not simply use a shared object database instead?

You can still have an alternative to some standalone bare repository of
the submodule if you do not like to store submodule objects in the
supermodule repository.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:18                                                                                       ` Martin Waitz
@ 2006-12-02 20:44                                                                                         ` Linus Torvalds
  2006-12-02 21:06                                                                                           ` Martin Waitz
                                                                                                             ` (2 more replies)
  0 siblings, 3 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02 20:44 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, sf, git



On Sat, 2 Dec 2006, Martin Waitz wrote:
> 
> I don't buy your scalability argument.

Try it.

Really. Get the mozilla import (450MB project), and clone it on a machine 
with half a gig of RAM or less.

Then, clone a couple of smaller archives that end up being 450MB 
_combined_, but clone them separately.

And watch the memory usage.

> By separating the repositories you just make reachability analyis be
> totally awkward, without winning anything.

Trust me. Try it out.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:21                                                                                           ` Martin Waitz
@ 2006-12-02 20:46                                                                                             ` Linus Torvalds
  2006-12-02 20:58                                                                                               ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02 20:46 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, sf, Git Mailing List, Andy Parkins



On Sat, 2 Dec 2006, Martin Waitz wrote:
> 
> What makes a submodule so special that now we suddenly have to store
> those stuff in the object database?

I'm not sure it is. I suspect a pure commit link with just a CVS-style 
"modules" file is sufficient. I'm just saying that I don't think it is 
_wrong_ to possibly want to expand it.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:46                                                                                             ` Linus Torvalds
@ 2006-12-02 20:58                                                                                               ` Martin Waitz
  2006-12-03  1:11                                                                                                 ` Josef Weidendorfer
  0 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 20:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, Git Mailing List, Andy Parkins

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 12:46:01PM -0800, Linus Torvalds wrote:
> On Sat, 2 Dec 2006, Martin Waitz wrote:
> > 
> > What makes a submodule so special that now we suddenly have to store
> > those stuff in the object database?
> 
> I'm not sure it is. I suspect a pure commit link with just a CVS-style 
> "modules" file is sufficient. I'm just saying that I don't think it is 
> _wrong_ to possibly want to expand it.

If we later see that we really want to have it we can always introduce
it later.  I don't think we should do it now if we don't see clear
benefits _now_.

So I was not against the link object itself (initially I wanted to do it
this way, too), only agains the information which was proposed to be
stored there.  Up to now I haven't found anything which makes sense to
store next to the submodule commit to define the identity of the
submodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:44                                                                                         ` Linus Torvalds
@ 2006-12-02 21:06                                                                                           ` Martin Waitz
  2006-12-02 21:29                                                                                             ` Linus Torvalds
  2006-12-02 21:22                                                                                           ` Linus Torvalds
  2006-12-03 20:46                                                                                           ` [RFC] Submodules in GIT Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Martin Waitz @ 2006-12-02 21:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, git

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 12:44:20PM -0800, Linus Torvalds wrote:
> On Sat, 2 Dec 2006, Martin Waitz wrote:
> > 
> > I don't buy your scalability argument.
> 
> Try it.
> 
> Really. Get the mozilla import (450MB project), and clone it on a machine 
> with half a gig of RAM or less.
> 
> Then, clone a couple of smaller archives that end up being 450MB 
> _combined_, but clone them separately.
> 
> And watch the memory usage.

Do I understand you correctly that the problem is not the algorithmic
complexity but that you have to map the objects at once instead of map
them in small parts one after the other?

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:44                                                                                         ` Linus Torvalds
  2006-12-02 21:06                                                                                           ` Martin Waitz
@ 2006-12-02 21:22                                                                                           ` Linus Torvalds
  2006-12-03  2:07                                                                                             ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Josef Weidendorfer
  2006-12-03 20:46                                                                                           ` [RFC] Submodules in GIT Martin Waitz
  2 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02 21:22 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, sf, git

On Sat, 2 Dec 2006, Linus Torvalds wrote:
> 
> And watch the memory usage.

Btw, just in case you don't understand _why_ this is true, the fact is, in 
a git repository, quite fundamentally, because we don't have "backlinks" 
at any stage at all, we don't know - and fundamentally _cannot_ know - 
whether we're goign to see the same object in the future.

So operations like "git-rev-list --objects" (or, these days, more commonly 
anything that just does the equivalent of that internally using the 
library interfaces - ie "git pack-objects" and friends) VERY FUNDAMENTALLY 
have to hold on to the object flags for the whole lifetime of the whole 
operation.

And you should realize that this is really really fundamental. You can't 
fix it with "smarter memory management". You can't fix it with "garbage 
collection". This is _not_ a result of the fact that we use C and malloc, 
and we don't free those objects, like some people sometimes seem to 
believe.

So garbage collection will never help this kind of situation. It flows 
_directly_ from the fact that our objects are immutable: because they are 
immutable, they don't have any backpointers, because we cannot (and must 
not) add backpointers to an old existing object when a new object is 
created that points to it.

So this really isn't a memory management issue. You could somewhat work 
around it by adding a "caching layer" on top of git, and allow that 
caching layer to modify their cache of old objects (so that they can 
contain back-pointers), but for 99% of all users that would actually make 
performance MUCH WORSE, and it would also be a serious problem for 
coherency issues (one of the things that immutable objects cause is that 
there are basically never any race conditions, while a "caching layer" 
like this would have some serious issues about serialization).

So: the very fundamental nature and choices that were made in git also 
means that when you have something like git-pack-objects that wants to 
walk the whole repo, you will end up with something that remembers EVERY 
SINGLE OBJECT it walked. 

And while I've worked very hard to make the memory footprint of individual 
objects as small as possible, and this means that this all works fine even 
for fairly large databases (especially since very few operations actually 
do this "traverse the whole friggin tree" thing), it does mean that 
there's a very fundamental limit to scalability. You can't just make a 
whole repository a hundred times bigger - because the operations that 
traverse the whole thing will require a hundred times more memory!

Now, in "real" projects, this is not a problem. I can pretty much 
_guarantee_ that memory sizes and hardware will grow faster than projects 
grow. I'm not AT ALL worried about the fact that in ten years, the linux 
kernel repository will likely be two or three times the size it is now. 
Because I'm absolutely convinced that in ten years, the machines we have 
now will be obsolete.

So on any "individual project" basis, the fact that memory requirements 
scale roughly as O(n) in the total repository size is simply not a 
problem. In fact, O(n) is pretty damn good, especially since the constant 
is pretty small (basically 28 bytes per object - and 20 of those bytes 
are the SHA1 that you simply cannot avoid).

But it does mean that supermodules really should NOT be so seamless that 
doing a "git clone" on a supermodule does one _large_ clone. Because it's 
simply going to be better to:

 - when you clone the supermodule, track the commits you need on all 
   submodules (this _may_ be a reason in itself for the "link" object, 
   just so that you can traverse the supermodule object dependencies and 
   know what subobject you are looking at even _without_ having to look at 
   the path you got there from)

 - clone submodules one-by-one, using the list of objects you gathered.

Maybe there are other solutions, but quite frankly, I doubt it. Yes, 
you'll end up "traversing" exactly as many objects either way, but the 
"globe subobjects one by one" is going to be a _hell_ of a lot more 
memory-efficient, and quite frankly, "memory usage" == "performance"  
under many loads (notably, any load that uses too much memory will _suck_ 
performance-wise, either because of swapping or simply because it will 
throw out caches that "many small invocations" would not have thrown out).

So I guarantee that it's going to be better to do five clones of five 
small repositories over one clone of one big one. If only because you need 
less memory to do the five smaller clones.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 21:06                                                                                           ` Martin Waitz
@ 2006-12-02 21:29                                                                                             ` Linus Torvalds
  0 siblings, 0 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-02 21:29 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, sf, git

On Sat, 2 Dec 2006, Martin Waitz wrote:
> 
> Do I understand you correctly that the problem is not the algorithmic
> complexity but that you have to map the objects at once instead of map
> them in small parts one after the other?

Not map them, but track their "used" flag. Yes. You can unmap objects any 
time at all (since you can just always re-create them at any time very 
easily and cheaply), but the one thing you CANNOT recreate is the object 
flags. See "struct object", and the "used" and FLAG_BITS in particular.

Almost all git programs need the FLAG_BITS. Something as simple as just 
traversing the commit history needs at a minimum one _single_ bit for each 
object: "Have I already seen this". In reality, you tend to need two or 
three more (ie the UNINTERESTING bit ends up being as important as the 
SEEN bit, because it's what determines whether it's reachable from some 
commit we're _not_ interested in, and in the end that's what allows us to 
not traverse the whole history).

So you need at a MINIMUM to track the bits

	#define SEEN            (1u<<0)
	#define UNINTERESTING   (1u<<1)

and in practice almost everything needs

	#define SHOWN           (1u<<3)

too (SEEN is for deciding whether to _traverse_ something, SHOWN is for 
deciding whether you've already output the data for this, and the 
difference is crucial for any depth-first DAG algorithm, since you need 
to test-and-set the one bit when you first encounter the object, and 
test-and-set the other bit when you "leave" the object).

So three bits are minimal to _any_ git traversal algorithm. Many specific 
issues want more bits (eg the TREECHANGE bit may not be quite as 
fundamnetal, but it sure ends up being critical for the "track subtree" 
case).

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:24                                                                                 ` Martin Waitz
@ 2006-12-03  0:55                                                                                   ` Josef Weidendorfer
  2006-12-03  6:29                                                                                     ` Martin Waitz
  0 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-03  0:55 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Linus Torvalds, sf, git

On Saturday 02 December 2006 21:24, Martin Waitz wrote:
> On Sat, Dec 02, 2006 at 12:17:44AM +0100, Josef Weidendorfer wrote:
> > After some thinking, a submodule namespace even is important for checking
> > out only parts of a supermodule, exactly because the root of a submodule
> > potentially can change at every commit.
> 
> have you ever thought about the idea that the location may be an
> important thing to consider for your decision.

Which decision, for what? Sorry, I do not understand.

Do you want to say that relative submodule root paths should be kept fix
the whole lifetime of a supermodule?
Ie. a submodule "identity" is bound to its relative path, and when we
move it, it should be seen as deleting at and creating a totally new,
different submodule?

That's fine.
But you have to handle submodule creation/deletion neverless. And while
you are at a commit which has a given submodule deleted, you have to
keep the submodule data somewhere - referencing it with a name.
I do not speak here about the object database, that could be combined;
but about all the other files in .git/ of the currently not checked out
submodule.

> Perhaps the submodule is now used for something else (this is why it was
> moved) and that now you'd like to keep it?

Can you give a usage szenario? What do you mean here?

> Anyway, you can just create several supermodules or implement generic
> partial tree support for git.  I do not see any reason to special case
> submodules here.

What should such a general partial tree support look like? I suppose you
want to configure paths which should not be checked out. As long as you
say that a given submodule always has to exist at a given path, you are
right: then, you can say: "Please, do not check out this submodule" which
is the same as "Do not check out this path". 

But I think it is quite restrictive to not allow to move submodules around.
When the supermodule upstream decides to move a submodule, your partial
tree config to not check out a submodule will be lost.
But more important, if you made changes to a given submodule, and pull from
upstream which changed the submodule position in-between, your changes will
be not taken over to the new position, as the move is seen as creation of
a totally independent submodule.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:43                                                                                     ` Martin Waitz
@ 2006-12-03  1:02                                                                                       ` Josef Weidendorfer
  0 siblings, 0 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-03  1:02 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Andy Parkins, git

On Saturday 02 December 2006 21:43, Martin Waitz wrote:
> On Sat, Dec 02, 2006 at 02:50:45PM +0100, Josef Weidendorfer wrote:
> > On Saturday 02 December 2006 11:04, Andy Parkins wrote:
> > > > So what do you do with deleted submodules?
> > > > You wouldn't want them to still sit around in your working directory,
> > > > but you still have to preserve them.
> > > 
> > > Now that is a tricky one.  Mind you, I think that problem exists for any 
> > > implementation.  I haven't got a good answer for that.
> > 
> > That suggests that it is probably better to separate submodule repositories
> > from their checked out working trees. Why not put the GITDIRs of the submodules
> > in subdirectories of the supermodules GITDIR instead?
> 
> Why not simply use a shared object database instead?

Sure. I have no problem with this.

But can we go one step further?
AFAICS your submodules store the .git/ directories of submodules directly
at submodule position in the working tree - but you have a link .git/objects
into the object database of the supermodule.
When the user wants to delete the submodule, he would remove this .git/ directory,
too. So you loose the .git/refs of the submodule etc. I would suggest to put
the submodule .git dirs into the .git dir of the supermodule.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:58                                                                                               ` Martin Waitz
@ 2006-12-03  1:11                                                                                                 ` Josef Weidendorfer
  0 siblings, 0 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-03  1:11 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Linus Torvalds, sf, Git Mailing List, Andy Parkins

On Saturday 02 December 2006 21:58, Martin Waitz wrote:
> So I was not against the link object itself (initially I wanted to do it
> this way, too), only agains the information which was proposed to be
> stored there.  Up to now I haven't found anything which makes sense to
> store next to the submodule commit to define the identity of the
> submodule.

Isn't it enough reason that a porcelain probably wants to store meta
information for a given submodule, giving the need to put a name/identity
to it?

Josef

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:16 ` Jakub Narebski
@ 2006-12-03  1:24   ` Robin Rosenberg
  2006-12-03  1:31     ` Jakub Narebski
  2006-12-03 11:00     ` Jakub Narebski
  0 siblings, 2 replies; 252+ messages in thread
From: Robin Rosenberg @ 2006-12-03  1:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

lördag 02 december 2006 21:16 skrev Jakub Narebski:
> The problem with submodule as separate git repository is that if you
> move submodule (subproject) somewhere else in the repository (or just
> rename it), you have to update alternates file... and this happens not
> only on move itself, but also on checkout and reset. But that can be
> managed by having in alternates all possible places the submodule ends
> into. I don't know if it is truly a problem.

A nasty problem with separate repositories for submodules is that when you 
screw up and git complains about everything you try do do, you previously 
could do rm -rf *; git reset --hard and retry whatever you were trying to do. 
With separate repositories your submodules will be resting in /dev/null, 
unless you're very, very careful. 


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03  1:24   ` Robin Rosenberg
@ 2006-12-03  1:31     ` Jakub Narebski
  2006-12-03 12:22       ` Robin Rosenberg
  2006-12-03 11:00     ` Jakub Narebski
  1 sibling, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-12-03  1:31 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Dnia niedziela 3. grudnia 2006 02:24, Robin Rosenberg napisał:
> lördag 02 december 2006 21:16 skrev Jakub Narebski:
> > The problem with submodule as separate git repository is that if you
> > move submodule (subproject) somewhere else in the repository (or just
> > rename it), you have to update alternates file... and this happens not
> > only on move itself, but also on checkout and reset. But that can be
> > managed by having in alternates all possible places the submodule ends
> > into. I don't know if it is truly a problem.
> 
> A nasty problem with separate repositories for submodules is that when you 
> screw up and git complains about everything you try do do, you previously 
> could do rm -rf *; git reset --hard and retry whatever you were trying to do. 
> With separate repositories your submodules will be resting in /dev/null, 
> unless you're very, very careful. 

Actually, rm -rf * is not needed for "git reset --hard" or
"git checkout -f" to succeed.

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
  2006-12-02 21:22                                                                                           ` Linus Torvalds
@ 2006-12-03  2:07                                                                                             ` Josef Weidendorfer
  2006-12-03  2:25                                                                                               ` Linus Torvalds
  2006-12-03  2:46                                                                                               ` Shawn Pearce
  0 siblings, 2 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-03  2:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Waitz, sf, git

On Saturday 02 December 2006 22:22, Linus Torvalds wrote:
> So operations like "git-rev-list --objects" (or, these days, more commonly 
> anything that just does the equivalent of that internally using the 
> library interfaces - ie "git pack-objects" and friends) VERY FUNDAMENTALLY 
> have to hold on to the object flags for the whole lifetime of the whole 
> operation.
>
> [...]
> 
> So this really isn't a memory management issue. You could somewhat work 
> around it by adding a "caching layer" on top of git, and allow that 
> caching layer to modify their cache of old objects (so that they can 
> contain back-pointers), but for 99% of all users that would actually make 
> performance MUCH WORSE, and it would also be a serious problem for 
> coherency issues (one of the things that immutable objects cause is that 
> there are basically never any race conditions, while a "caching layer" 
> like this would have some serious issues about serialization).

Thinking about this...
You have to make very sure to always update the caching layer containing
the backlinks on every addition of a further object. You can do this
because you always reached this new object by some other object, which
exactly is the backpointer.

Now let us suppose we are able to do this.
What does this give us?

Take a look at object traversal:
We have to store the flag "already visited" for objects we could reach
again in the traversal. But with the backlinks, we can see that most
of the objects can only be reached via one path, and therefore, there
is no need to store the flag, as it never will be queried in the
further traversal.
(Similar for objects with two paths: When you have visited the object
two times, you can throw away the flag, as it is not queried any more).

Regarding the caching layer and object traversal, it would have been
enough to only store "is this object reachable via more than 1 path?".
For this, the "cache" could be the set of objects reachable with
more than one path.
And such a set stored in a file should be quite managable, and be
quite small, relative to the size of the object database.

In fact, this "cache" can be created with a usual object traversal
(which has the original memory requirement), but as long as we do
not add objects to the database, further traversals would only need
a fraction of memory.

When only adding a small number of objects, it should be easy to
update the cache; while with big actions like fetching/pulling,
we simply should remove the file with the backlink information.

> problem. In fact, O(n) is pretty damn good, especially since the constant 
> is pretty small (basically 28 bytes per object - and 20 of those bytes 
> are the SHA1 that you simply cannot avoid).

Again only some thoughts...
Pack files are fully self-contained object stores, yes?
So in the scope of a single pack file, the offset of this object is enough
as object identification.
If we could make sure that in any given algorithm touching objects, like
commit traversal, we always have the offset available when we need to do
an object lookup, then, it should be enough to store object flags only
indexed by the offset of this object in the pack.
The translation SHA1 -> offset can be done with the pack index.
As you usually have multiple packs, a (pack number / offset) tuple should
be enough as object ID.

Thinking even one step further:
Would it make sense to define an encoding format for the content of
commit and tree objects inside of packs, where the SHA1 is replaced by the
offset of the object in this pack?
As exactly the SHA1 is the least compressable thing, this could promise
quite a benefit.
AFAIK, we currently only use these offsets for referencing objects in
delta chains.

More about the original topic of this thread (and off-topic to the
new subject):

> But it does mean that supermodules really should NOT be so seamless that 
> doing a "git clone" on a supermodule does one _large_ clone. Because it's 
> simply going to be better to:
> 
>  - when you clone the supermodule, track the commits you need on all 
>    submodules (this _may_ be a reason in itself for the "link" object, 
>    just so that you can traverse the supermodule object dependencies and 
>    know what subobject you are looking at even _without_ having to look at 
>    the path you got there from)
> 
>  - clone submodules one-by-one, using the list of objects you gathered.

Without submodule identities, we would have to clone path-by-path, as
we can not distinguish different submodules apart from there location.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
  2006-12-03  2:07                                                                                             ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Josef Weidendorfer
@ 2006-12-03  2:25                                                                                               ` Linus Torvalds
  2006-12-03  2:46                                                                                               ` Shawn Pearce
  1 sibling, 0 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-03  2:25 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Martin Waitz, sf, git

On Sun, 3 Dec 2006, Josef Weidendorfer wrote:
> 
> Thinking about this...
> You have to make very sure to always update the caching layer containing
> the backlinks on every addition of a further object. You can do this
> because you always reached this new object by some other object, which
> exactly is the backpointer.

You're missing the big issue.

The issue is that a cache like that would ABSOLUTELY SUCK.

You could speed up the non-common operations with it, but:

 - any changes would become a LOT more expensive to do, because they all 
   need to update every single object they add (ie a "commit" would now 
   have to add backpointers TO EVERY SINGLE BLOB).

   Imagine what this does to something like the kernel, where a commit 
   reaches 22,000 files!

   You can do it at a finer granularity (ie do just the direct backlinks 
   and only do the "tree->blob" and "tree->tree" things rather than the 
   full commit reachability, but it's still going to be MUCH more painful 
   than what we do now.

 - the cache would be a lot bigger than the current pack-files, and it 
   would be fragile as hell to boot. Because it needs to get rewritten for 
   every operation, it gets corrupted much more easily, and that's 
   ignoring things like race conditions, so it would now need a ton of 
   locking that git simply doesn't do at all.

 - everything would basically slow down.

 - you couldn't do shared object databases AT ALL, because backpointers 
   wouldn't work. The whole _reason_ you can share object databases is the 
   same reason we can't have backpointers: objects are immutable and never 
   change depending on circustances.

The _only_ downside of the current situation is literally the 24 or 28 
bytes per object that we look at. For most operations, we don't even look 
at that many objects, so it's really the worst-case things.

> In fact, this "cache" can be created with a usual object traversal
> (which has the original memory requirement), but as long as we do
> not add objects to the database, further traversals would only need
> a fraction of memory.

Right. If the project is totally read-only, the cache would work well.

For real development, it would SUCK. It would make things like "git reset" 
very expensive indeed, for example (you'd have to unwind the whole cache: 
either regenerating it - which would take minutes - or being very careful 
indeed and being able to always remove objects properly and keeping track 
of them 100%).

IOW, it's nasty nasty nasty. And it doesn't really even help anything but 
a case that we actually already handle really well (I spent a lot of 
effort on making the memory footprint minimal).

But it does mean that you do NOT want to traverse a hundred different 
project "as if" they were one. That's really the only thing it means.

And since you can do submodules as independent projects, and you SHOULD do 
them that way for tons of other reasons _anyway_, even that isn't a reason 
to screw up all the _wonderful_ properties of the git object database.

So what I'm trying to say is that the immutable non-backpointer nature of 
the git database is what makes it so WONDERFUL. It's efficient, it's 
dense, it's stable, and it allows us all the clever things we do. But it 
means that we do end up alway spending 28 bytes per object, and we can 
never throw those 28 bytes away during a single "traversal" run.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
  2006-12-03  2:07                                                                                             ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Josef Weidendorfer
  2006-12-03  2:25                                                                                               ` Linus Torvalds
@ 2006-12-03  2:46                                                                                               ` Shawn Pearce
  2006-12-03  3:21                                                                                                 ` Josef Weidendorfer
  1 sibling, 1 reply; 252+ messages in thread
From: Shawn Pearce @ 2006-12-03  2:46 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Linus Torvalds, Martin Waitz, sf, git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> Thinking even one step further:
> Would it make sense to define an encoding format for the content of
> commit and tree objects inside of packs, where the SHA1 is replaced by the
> offset of the object in this pack?
> As exactly the SHA1 is the least compressable thing, this could promise
> quite a benefit.

I actually had the same idea the other day.  I discarded it after
thinking about it for a minute.  Here's the problem:

Lets say we do this for the tree and parent IDs in a commit, because
these are the most commonly needed part of a commit during revision
traversal.  So we want to put the offset to the tree and the offset
to each parent at the front of the commit somehow to make them very
cheap to access.

This means that when we start to write out a commit we need to know
the offset to the tree that commit references.  But git-pack-objects
sorts object by type: commit, tree, blob (I forget where tags go,
but they aren't important in this context).  So generally *all*
commits appear before the first tree.  So when we write out the first
commit we need to know exactly how many bytes every commit will need
(compressed mind you) in this pack so we can determine the position
of the first tree.  Now do this for every commit and every tree
that those commits use...  yes, its a lot of work to precompute
and store all offsets before you even write out the first byte.

Its even worse with parent commits because ancestors tend to appear
behind the commit (newest->oldest) so that "git log" can benefit
from OS read-ahead.  So you also have to keep track of your parent
commmit offsets.  Not pretty.

Extending that idea to tree objects (store the offset of the entry)
makes the issue even uglier.

Oh, and packs aren't entirely self-contained.  A pack is only self
contained in the sense that no object in the pack deltafies against
an object outside of the pack[1].  However by design an object
(e.g. a commit or a tree) can reference an object which is either
loose or which is in another pack.  This is especially important
for every large projects where not every commit/tree/tag/blob will
fit into one 4 giB file.

**1** Except in the case of thin packs, which are used only on the
network and only to save bandwidth.

> AFAIK, we currently only use these offsets for referencing objects in
> delta chains.

Yes, that's a recent feature to reference a delta base.

-- 

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
  2006-12-03  2:46                                                                                               ` Shawn Pearce
@ 2006-12-03  3:21                                                                                                 ` Josef Weidendorfer
  2006-12-03 11:10                                                                                                   ` Jakub Narebski
  0 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-03  3:21 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Linus Torvalds, Martin Waitz, sf, git

On Sunday 03 December 2006 03:46, Shawn Pearce wrote:
> Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> > Thinking even one step further:
> > Would it make sense to define an encoding format for the content of
> > commit and tree objects inside of packs, where the SHA1 is replaced by the
> > offset of the object in this pack?
> > As exactly the SHA1 is the least compressable thing, this could promise
> > quite a benefit.
> [...]
> 
> This means that when we start to write out a commit we need to know
> the offset to the tree that commit references.  But git-pack-objects
> sorts object by type: commit, tree, blob (I forget where tags go,
> but they aren't important in this context).  So generally *all*
> commits appear before the first tree.  So when we write out the first
> commit we need to know exactly how many bytes every commit will need
> (compressed mind you) in this pack so we can determine the position
> of the first tree.  Now do this for every commit and every tree
> that those commits use...  yes, its a lot of work to precompute
> and store all offsets before you even write out the first byte.

Yes, it looks like a hen-and-egg problem, but IMHO you can
handle it nicely with another redirection, i.e. a table you build
up while repacking the file, and storing this table at the end.

You simply sequentially renumber any object SHA, starting from 0
in the order you see them. You can do two renumberings, one for
the objects contained in the original pack (1), and one for the
external ones (2). Put these new numbers (with a bit distinguishing
(1) and (2)) as replacement into commit/tree objects.
At the end, you have the new offsets for objects in (1). Put
redirection tables for (1) [new number -> new offset]
and (2) [other new number->SHA1 of external object] at the end
of the new pack.
This way, you effectivly have removed all incompressable SHAs from
the pack file aside from one entry in the redirection tables for
each external object.

The only problem I see is how to decode the objects, i.e. how to
get the original SHA1 from an offset: we can not recalculate the
SHA1 from the object content as we changed the content itself.
But there should be a way to store the SHA1 in front of the object
somehow, perhaps it is already given by the current format? 

Am I missing something here?

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03  0:55                                                                                   ` Josef Weidendorfer
@ 2006-12-03  6:29                                                                                     ` Martin Waitz
  0 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-03  6:29 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Linus Torvalds, sf, git

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

hoi :)

On Sun, Dec 03, 2006 at 01:55:08AM +0100, Josef Weidendorfer wrote:
> On Saturday 02 December 2006 21:24, Martin Waitz wrote:
> > On Sat, Dec 02, 2006 at 12:17:44AM +0100, Josef Weidendorfer wrote:
> > > After some thinking, a submodule namespace even is important for checking
> > > out only parts of a supermodule, exactly because the root of a submodule
> > > potentially can change at every commit.
> > 
> > have you ever thought about the idea that the location may be an
> > important thing to consider for your decision.
> 
> Which decision, for what? Sorry, I do not understand.

to check out, or not to check out.

> What should such a general partial tree support look like? I suppose you
> want to configure paths which should not be checked out. As long as you
> say that a given submodule always has to exist at a given path, you are
> right: then, you can say: "Please, do not check out this submodule" which
> is the same as "Do not check out this path". 

You could say something like "do not check out anything below "test/".
If then some submodule moves from "test/foo" to "build/foo", it will be
checked out, because this module is now not only used for testing, but
is needed for building in the new version of the supermodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 19:41                                                                                       ` Linus Torvalds
@ 2006-12-03  9:19                                                                                         ` Torgil Svensson
  2006-12-03 17:54                                                                                           ` Linus Torvalds
  2006-12-03 19:33                                                                                         ` Andy Parkins
                                                                                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-03  9:19 UTC (permalink / raw)
  To: Linus Torvalds, sf-gmane, sf, git, Martin Waitz

On 12/2/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> In other words, I don't think people expect or want something hugely more
> complicated than the CVS/modules kind of file.

What about the case when you want _everything_, do you then have to
know the names of all submodules, present and past?

If you have an old irrelevant submodule in the history that happens to
have the same name as one of them you are interested in, do you get
this as well?

During a debugging session it might be convenient to do a "all but X"
kind of fetch if you have a project dependent on several small modules
and one of them is the big black sheep.

For simple cases, I think it's sufficient to have the "everyone or
no-one" option. If git enforces sending submodules one by one and
requires the fetching side to specify links explicitly couldn't the
selection be left to the user to decide with "hooks" or plumbing?
Default hook could implement a simple white- or black-list.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
       [not found]                                                                                         ` <200612021255.59972.Josef.Weidendorfer@gmx.de>
@ 2006-12-03  9:42                                                                                           ` Andy Parkins
  0 siblings, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-03  9:42 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Git Mailing List

On Saturday 2006, December 02 11:55, Josef Weidendorfer wrote:

> > > 	100644 blob 08602f522183dc43787616f37cba9b8af4e3dade	xdiff-interface.c
> > > 	100644 blob 1346908bea31319aabeabdfd955e2ea9aab37456	xdiff-interface.h
> > > 	040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2	xdiff
> > > 	050000 link 0215ffb08ce99e2bb59eca114a99499a4d06e704	xyzzy
> > >
> > > where that 050000 is the new magic type (I picked one out of my *ss:
> > > it's not a valid type for a file mode, so it's a godo choice, but it
> > > could be anythign that cannot conflict with a real file), which just
> > > specifies the "link" part. The SHA1 is the SHA1 of the commit, and the
> > > "xyzzy" is obviously just the name within the directory of the
> > > submodule.
> >
> > Can I argue that the hash in that object should actually be to a real
> > object in the supermodule repository rather than a link?
>
> That is the thing we already are discussing here :-)
> IMHO, submodule IDs make a lot of sense, and this needs to specify the
> submodule ID at every link. Which would force us to use seperate objects.

I wasn't really going as deep as a submodule ID.  Just moving the submodule 
commit hash from the supermodule tree, to a supermodule "link" object.  What 
goes in that object is a separate problem I believe.

The primary reason I think it's a good idea is that it is consistent with 
every other hash in the tree.  It seems to be inconsistent to say

blob objects have a hash that points to an object in this repo
tree objects have a hash that points to an object in this repo
link objects have a hash the points to an object in a different repo

> However, I am not speaking about some separation issue, but more about a
> design decision. For fetching/pulling/merging, you want be able to
> distinguish submodules not only by the commit id into the submodule:
> multiple
> submodules could link into the same DAG (but different branches) of another
> repository which would make unique fetching/pulling/merge decisions
> difficult, especially when you think about the possibility that the
> relative root path of a submodule in a supermodule should be able to change
> at any supermodule commit.

I can't say I've understood what you mean here.  There is no difference in 
facilities if there is a link object in the local repository as well.  It's 
merely an extra layer of indirection.  Apart from the tiny cost of 
dereferencing that link object, there is no disadvantage.


Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03  1:24   ` Robin Rosenberg
  2006-12-03  1:31     ` Jakub Narebski
@ 2006-12-03 11:00     ` Jakub Narebski
  1 sibling, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-03 11:00 UTC (permalink / raw)
  To: git

Robin Rosenberg wrote:

> lördag 02 december 2006 21:16 skrev Jakub Narebski:
>>
>> The problem with submodule as separate git repository is that if you
>> move submodule (subproject) somewhere else in the repository (or just
>> rename it), you have to update alternates file... and this happens not
>> only on move itself, but also on checkout and reset. But that can be
>> managed by having in alternates all possible places the submodule ends
>> into. I don't know if it is truly a problem.
> 
> A nasty problem with separate repositories for submodules is that when you 
> screw up and git complains about everything you try do do, you previously 
> could do rm -rf *; git reset --hard and retry whatever you were trying to do. 
> With separate repositories your submodules will be resting in /dev/null, 
> unless you're very, very careful. 

The solution to this concern could be having GIT_DIR for submodule
outside it's working area, for example somewhere in GIT_DIR of
the supermodule, and use either symlink or (to be coded) .gitlink
symbolic reference to GIT_DIR file. Disadvantage of that is that it
moves troubles with moving subproject (although there are no troubles
with simple subproject directory renaming) from alternates file to
GIT_DIR link representation in submodule.

As Linus said, there are advantages to having submodule repository use
separate object database (clone and other operation scaling, index size),
and I think they outweight the troubles with moving/renaming the directory
submodule resides in.

P.S. I think that the problem with bad performance of too large index
is similar to the problems filesystems have with directories with large
number of files; some filesystems solved this problem.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
  2006-12-03  3:21                                                                                                 ` Josef Weidendorfer
@ 2006-12-03 11:10                                                                                                   ` Jakub Narebski
  2006-12-03 11:47                                                                                                     ` Josef Weidendorfer
  0 siblings, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-12-03 11:10 UTC (permalink / raw)
  To: git

Josef Weidendorfer wrote:

> On Sunday 03 December 2006 03:46, Shawn Pearce wrote:
>> Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
>>> Thinking even one step further:
>>> Would it make sense to define an encoding format for the content of
>>> commit and tree objects inside of packs, where the SHA1 is replaced by
>>> the offset of the object in this pack?
>>> As exactly the SHA1 is the least compressable thing, this could promise
>>> quite a benefit.
>> [...]
>> 
>> This means that when we start to write out a commit we need to know
>> the offset to the tree that commit references.  But git-pack-objects
>> sorts object by type: commit, tree, blob (I forget where tags go,
>> but they aren't important in this context).  So generally *all*
>> commits appear before the first tree.  So when we write out the first
>> commit we need to know exactly how many bytes every commit will need
>> (compressed mind you) in this pack so we can determine the position
>> of the first tree.  Now do this for every commit and every tree
>> that those commits use...  yes, its a lot of work to precompute
>> and store all offsets before you even write out the first byte.
> 
> Yes, it looks like a hen-and-egg problem, but IMHO you can
> handle it nicely with another redirection, i.e. a table you build
> up while repacking the file, and storing this table at the end.
> 
> You simply sequentially renumber any object SHA, starting from 0
> in the order you see them. You can do two renumberings, one for
> the objects contained in the original pack (1), and one for the
> external ones (2). Put these new numbers (with a bit distinguishing
> (1) and (2)) as replacement into commit/tree objects.
> At the end, you have the new offsets for objects in (1). Put
> redirection tables for (1) [new number -> new offset]
> and (2) [other new number->SHA1 of external object] at the end
> of the new pack.
> This way, you effectivly have removed all incompressable SHAs from
> the pack file aside from one entry in the redirection tables for
> each external object.
> 
> The only problem I see is how to decode the objects, i.e. how to
> get the original SHA1 from an offset: we can not recalculate the
> SHA1 from the object content as we changed the content itself.
> But there should be a way to store the SHA1 in front of the object
> somehow, perhaps it is already given by the current format? 
> 
> Am I missing something here?

Doesn't this idea clash with the object and delta reusing for repack?
Hmmm... perhaps with the two indirect tables it wouldn't, only
the tables would need to be recalculated... or perhaps it would because
of offset clashes.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
  2006-12-03 11:10                                                                                                   ` Jakub Narebski
@ 2006-12-03 11:47                                                                                                     ` Josef Weidendorfer
  0 siblings, 0 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-03 11:47 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Sunday 03 December 2006 12:10, Jakub Narebski wrote:
> > You simply sequentially renumber any object SHA, starting from 0
> > in the order you see them. You can do two renumberings, one for
> > the objects contained in the original pack (1), and one for the
> > external ones (2). Put these new numbers (with a bit distinguishing
> > (1) and (2)) as replacement into commit/tree objects.
> > At the end, you have the new offsets for objects in (1). Put
> > redirection tables for (1) [new number -> new offset]
> > and (2) [other new number->SHA1 of external object] at the end
> > of the new pack.
> 
> Doesn't this idea clash with the object and delta reusing for repack?

In general, yes: you modify object content by encoding, and if you want
to reuse the objects without decompression, and need to keep the info
to be able to decode, ie. the redirection table.

This gets problematic if you want to join multiple packs, or fetch objects
and want to reuse the compressed representation, as the object renumbering
is only local to one pack, and numbers can be reused between packs.

So this idea is probably for archival packs only.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03  1:31     ` Jakub Narebski
@ 2006-12-03 12:22       ` Robin Rosenberg
  2006-12-03 12:31         ` Jakub Narebski
  0 siblings, 1 reply; 252+ messages in thread
From: Robin Rosenberg @ 2006-12-03 12:22 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

söndag 03 december 2006 02:31 skrev Jakub Narebski:
> Actually, rm -rf * is not needed for "git reset --hard" or
> "git checkout -f" to succeed.

True, but git reset --hard isn't always enough and rm -rf * is the good ol' 
way of resetting things. Typically this comes from make being upset (or too 
content) with something.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03 12:22       ` Robin Rosenberg
@ 2006-12-03 12:31         ` Jakub Narebski
  0 siblings, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-03 12:31 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Robin Rosenberg wrote:
> söndag 03 december 2006 02:31 skrev Jakub Narebski:
>
>> Actually, rm -rf * is not needed for "git reset --hard" or
>> "git checkout -f" to succeed.
> 
> True, but git reset --hard isn't always enough and rm -rf * is the good ol' 
> way of resetting things. Typically this comes from make being upset (or too 
> content) with something.

$ git clean -d -x -q
$ git reset --hard

(or vice versa) then.

Besides, "rm -rf *" should not remove hidden files and hidden directories,
including .git directory. And you should take great care with "rm -rf .*"
if it doesn't follow '..'
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03  9:19                                                                                         ` Torgil Svensson
@ 2006-12-03 17:54                                                                                           ` Linus Torvalds
  2006-12-04 20:26                                                                                             ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-03 17:54 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: sf-gmane, sf, git, Martin Waitz

On Sun, 3 Dec 2006, Torgil Svensson wrote:
>
> On 12/2/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > 
> > In other words, I don't think people expect or want something hugely more
> > complicated than the CVS/modules kind of file.
> 
> What about the case when you want _everything_, do you then have to
> know the names of all submodules, present and past?

Afaik, the way people do this historically is simply:

 - often have an alias for "everything" (eg "all" or "src" or "world"), 
   and if you want everything, you basically ask for it by checking out 
   the "src" module.

   Ie this is the "upstream" way to let downstream check out everything.

 - if you're downstream, and you have a partial repo, and you realize that 
   you want everything else, you just look at gitweb (assuming it is 
   extended to show module information, of course ;) or the .gitmodules 
   (or whatever it would be called) file to get the other pieces manually.

But hey, I also think it would be fine to have "git clone --allmodules" or 
something ("fetch" too). I think this whole question will depend more on 
how people end up _using_ module support than on any technical issues per 
se. Again, I suspect the people who now set up modules in CVS are likely 
to have a better idea than I do about how they usually do it (and why).

> If you have an old irrelevant submodule in the history that happens to
> have the same name as one of them you are interested in, do you get
> this as well?

I dunno. Details, details. I'm also not sure this is hugely important.

It could be "solved" by simply having the requirement that all modules 
need to be named differently (notice that "module name" is _not_ the same 
thing as "the directory name where the module shows up". That's not the 
case even in CVS modules, and with a "link" type in the git tree object, 
the directory where a module shows up would basically be totally 
independent of the "name" of the module).

> During a debugging session it might be convenient to do a "all but X"
> kind of fetch if you have a project dependent on several small modules
> and one of them is the big black sheep.

I suspect it's more common to name the modules you want to fetch 
explicitly, rather than make it a "negative" choice, but that sounds 
largely like just an interface issue.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 19:41                                                                                       ` Linus Torvalds
  2006-12-03  9:19                                                                                         ` Torgil Svensson
@ 2006-12-03 19:33                                                                                         ` Andy Parkins
  2006-12-05  2:33                                                                                         ` Daniel Barkalow
  2006-12-09 21:34                                                                                         ` R. Steve McKown
  3 siblings, 0 replies; 252+ messages in thread
From: Andy Parkins @ 2006-12-03 19:33 UTC (permalink / raw)
  To: git

On Saturday 2006, December 02 19:41, Linus Torvalds wrote:

> Now, I'm not exactly sure who wants to use git modules, so this is the
> time to ask: did you hate the CVS/modules file? Or was it something you
> set up once, and then basically forgot about? People clearly use the
> ability to mark certain modules as depending on each other, and aliases to
> say "if you ask for this module, you actually get a set of _these_
> modules".

Never used CVS/modules, but I used svn:externals.  I have a few projects that 
are libraries that I use in many other projects.  So, my directory tree looks 
like this:

 projects/
   libX/
   projectP/
    libX/
   projectQ/
    libX/

The nightmare I had was that I would add a feature to projectP/libX, and 
commit it.  Great.  Then later I'd do "svn update" in projectQ - HAVING MADE 
NO CHANGES TO IT - and libX would update to the latest version, which turns 
out to be incompatible with projectQ, and I can no longer even build 
projectQ.  If only libX would stay where it was put.  The worst of it is if 
you check out an older version, say "stable-release" that you tagged last 
year, the svn:external would always just check out the latest version, so 
you'd have to go back through the logs to find out what approximate submodule 
revision you should really check out, check it out and then remember not to 
do svn update, because that would just reset the external to the latest 
version.  AHHHHHHH!  Maddening to say the least.

This fits exactly with what you have described as the primary reason for 
wanting submodules.  I didn't want seamless integration, I was happy to 
change into projectP/libX to make libX commits.  All I actually wanted was 
the particular checkout of libX for a particular checkin of projectP to be 
remembered.  That's it.  Anything else is just gravy.

I'm doing exactly the same sort of thing now but with git.  git hasn't fixed 
the problem (yet) but certainly hasn't made it any worse than it was.  
svn:externals were nothing more than a way of storing a URL in the 
repository - who cares, I wish now I'd never bothered, they serve no version 
control purpose and are merely a UI convenience.

Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 20:44                                                                                         ` Linus Torvalds
  2006-12-02 21:06                                                                                           ` Martin Waitz
  2006-12-02 21:22                                                                                           ` Linus Torvalds
@ 2006-12-03 20:46                                                                                           ` Martin Waitz
  2 siblings, 0 replies; 252+ messages in thread
From: Martin Waitz @ 2006-12-03 20:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, git

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

hoi :)

On Sat, Dec 02, 2006 at 12:44:20PM -0800, Linus Torvalds wrote:
> And watch the memory usage.

hmm, really sad, it was such a nice concept until now...
You are right, I have to think more about scalability. O(N) anywhere is
really bad for submodules.  They really should be able to bundle the kernel,
mozilla, qt and whatnot into one project and that will get huge.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:12                                                                                     ` Linus Torvalds
                                                                                                         ` (2 preceding siblings ...)
  2006-12-02 20:18                                                                                       ` Martin Waitz
@ 2006-12-03 22:16                                                                                       ` Sven Verdoolaege
  2006-12-03 22:32                                                                                         ` Linus Torvalds
  3 siblings, 1 reply; 252+ messages in thread
From: Sven Verdoolaege @ 2006-12-03 22:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, git, Martin Waitz, sf

On Fri, Dec 01, 2006 at 04:12:10PM -0800, Linus Torvalds wrote:
> So within the supermodule, on a "git object" level, a submodule should 
> just be named by the SHA1 that was it's HEAD when it was committed within 
> the supermodule. So in the "tree object", you'd see something like the 
> following when you go "git ls-tree HEAD" on the superproject:
> 
> 	...
> 	100644 blob 08602f522183dc43787616f37cba9b8af4e3dade	xdiff-interface.c
> 	100644 blob 1346908bea31319aabeabdfd955e2ea9aab37456	xdiff-interface.h
> 	040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2	xdiff
> 	050000 link 0215ffb08ce99e2bb59eca114a99499a4d06e704	xyzzy
> 
> where that 050000 is the new magic type (I picked one out of my *ss: it's 
> not a valid type for a file mode, so it's a godo choice, but it could be 
> anythign that cannot conflict with a real file), which just specifies the 
> "link" part. The SHA1 is the SHA1 of the commit, and the "xyzzy" is 
> obviously just the name within the directory of the submodule.
> 
> That's all that is actually required for a lot of git commands that 
> already expect all objects to be available (ie "git checkout", "git diff" 
> etc).

But is this object (and all the objects it points to) going to be
available (in the superproject) ?
The following seems to suggest that you think they shouldn't.
How is fsck-objects then going to check that such an object is
valid ?  Is it going to call fsck-objects recursively on the
(available) submodules ?

On Fri, Dec 01, 2006 at 03:30:32PM -0800, Linus Torvalds wrote:
> The only thing that a submodule must NOT be allowed to do on its own is 
> pruning (and it's distant cousin "git repack -d").

How are you going to enforce this if the submodule isn't supposed
to know that it is being used as a submodule ?

> You must always prune 
> from the supermodule, because the submodule cannot really know on its own 
> what references point into it.

How is one of the supermodules going to know what references from other
supermodules containing the submodule point into the submodule ?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03 22:16                                                                                       ` Sven Verdoolaege
@ 2006-12-03 22:32                                                                                         ` Linus Torvalds
  2006-12-03 22:49                                                                                           ` Jakub Narebski
  2006-12-04 11:12                                                                                           ` Josef Weidendorfer
  0 siblings, 2 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-03 22:32 UTC (permalink / raw)
  To: skimo; +Cc: Josef Weidendorfer, sf, git, Martin Waitz, sf

On Sun, 3 Dec 2006, Sven Verdoolaege wrote:
> 
> On Fri, Dec 01, 2006 at 03:30:32PM -0800, Linus Torvalds wrote:
> > The only thing that a submodule must NOT be allowed to do on its own is 
> > pruning (and it's distant cousin "git repack -d").
> 
> How are you going to enforce this if the submodule isn't supposed
> to know that it is being used as a submodule ?

Note that there's actually two "submodules":

 - there's the submodule "project" itself.

   This one must be totally unaware of the supermodule, because this one 
   might be cloned and copied _independently_ of the supermodule.

 - there's the PARTICULAR CHECKED-OUT COPY of the submodule that is 
   actually checked out in a supermodule.

   This is just a specific _instance_ of the particular submodule.

So a particular instance of a submodule might be "aware" of the fact that 
it's a submodule of a supermodule. For example, the "awareness" migth be 
as simple as just a magic flag file inside it's .git/ directory. And that 
awareness would be what simply disabled pruning or "repack -d" within that 
particular instance.

But this magic flag doesn't affect the bigger-picture git repository. It's 
a _private_ flag. So it doesn't affect the git part, any more than it 
really affects the git repository that you may have a

	[user]
		name = Myname
		email = myemail

in your .git/config file.

See? You can have private data in a git repository, but that doesn't mean 
that it's visible as _repository_ data. But it can still affect how git 
commands act (eg the "user" definitions above will affect the default user 
information that "git commit" uses, of course, without actually affecting 
the git archive in any other way)

> How is one of the supermodules going to know what references from other
> supermodules containing the submodule point into the submodule ?

Why would it care? They are other supermodules. It doesn't matter, the 
same way it doesn't matter that _my_ "git" tree may not have all the same 
references that _your_ "git" repo has. If I want to get the same 
references, I'd need to fetch them from you, and at that point, I'd need 
to get all the objects that are pointed to by those refs too. But only on 
"git fetch" do you actually start caring.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03 22:32                                                                                         ` Linus Torvalds
@ 2006-12-03 22:49                                                                                           ` Jakub Narebski
  2006-12-04 11:12                                                                                           ` Josef Weidendorfer
  1 sibling, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-03 22:49 UTC (permalink / raw)
  To: git

Linus Torvalds wrote:

> On Sun, 3 Dec 2006, Sven Verdoolaege wrote:
>> 
>> On Fri, Dec 01, 2006 at 03:30:32PM -0800, Linus Torvalds wrote:
>>> The only thing that a submodule must NOT be allowed to do on its own is 
>>> pruning (and it's distant cousin "git repack -d").
>> 
>> How are you going to enforce this if the submodule isn't supposed
>> to know that it is being used as a submodule ?
> 
> Note that there's actually two "submodules":
> 
>  - there's the submodule "project" itself.
> 
>    This one must be totally unaware of the supermodule, because this one 
>    might be cloned and copied _independently_ of the supermodule.
> 
>  - there's the PARTICULAR CHECKED-OUT COPY of the submodule that is 
>    actually checked out in a supermodule.
> 
>    This is just a specific _instance_ of the particular submodule.
> 
> So a particular instance of a submodule might be "aware" of the fact that 
> it's a submodule of a supermodule. For example, the "awareness" migth be 
> as simple as just a magic flag file inside it's .git/ directory. And that 
> awareness would be what simply disabled pruning or "repack -d" within that 
> particular instance.

If we use objects/info/alternates (or equivalent, e.g. objects/info/modules,
or modules file) in superproject to refer to submodule repository object
database (so superproject has access to all the objects including
submodule), I'd prefer to have in submodule objects/info/borrowers file,
which would point to superproject (and to other repositories which have
submodule as one of alternate object databases) for git-prune and friends
to check which parts are truly unreachable.

This would be generic solution to the problem with alternates, not only
specific to submodule support.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03 22:32                                                                                         ` Linus Torvalds
  2006-12-03 22:49                                                                                           ` Jakub Narebski
@ 2006-12-04 11:12                                                                                           ` Josef Weidendorfer
  1 sibling, 0 replies; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-04 11:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: skimo, sf, git, Martin Waitz, sf, Jakub Narebski

On Sunday 03 December 2006 23:32, Linus Torvalds wrote:
> So a particular instance of a submodule might be "aware" of the fact that 
> it's a submodule of a supermodule. For example, the "awareness" migth be 
> as simple as just a magic flag file inside it's .git/ directory. And that 
> awareness would be what simply disabled pruning or "repack -d" within that 
> particular instance.

That prohibits the problem in your supermodule and your instance of the
given submodule.

But IMHO, using a submodule commit which could be removed by pruning in
another instance of the submodule is really not the thing you ever want.
If you start your own branch in a submodule, and start to rely on it in
the supermodule, you _will_ want to push this to the submodule upstream.

And if you find that you have to rebase in the submodule, you simply
have to rewrite your branch commits in the supermodule too. Otherwise,
you effectively fork the submodule project purely for your superproject.

So I suppose that in practical use, pruning in submodules probably
would not have any negative effect. If it has, you made something
wrong. So you probably should only a submodule commit if it has
"publishing quality" (unless being on a temporary supermodule branch).

Ie. any "borrowers" file should be empty.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02  0:33                                                                                     ` Linus Torvalds
  2006-12-02  9:27                                                                                       ` Andy Parkins
@ 2006-12-04 18:56                                                                                       ` Michael K. Edwards
  2006-12-05  1:31                                                                                         ` Sam Vilain
  1 sibling, 1 reply; 252+ messages in thread
From: Michael K. Edwards @ 2006-12-04 18:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, sf, git, Martin Waitz

(I wrote most of this a couple of days ago, so it's not at the tip of
the conversational tree, so to speak.  But it's effectively a response
to Linus's "what do you want to do with submodules" question, with
some thoughts on implementation.  Sorry it's so long; like Blaise
Pascal, "I would have written a shorter letter, but I did not have the
time.")

The supermodule concept, implemented right, could really improve
cooperation among embedded platform integrators, boutique distro
publishers, and other editorial contributors to sprawling metaprojects
who don't want to run kernel.org-scale mirrors.  To make this work,
you need sparse repositories (conserving resources when fetching, by
omitting the bulk of currently un-needed submodules that can reliably
be obtained later from elsewhere) and shallow cloning (conserving
resources when publishing, by referring cloners to a third-party
repository for universally available content).

For instance, it would be a wonderful thing if the pile-o-patches
nightmare that is PTXdist (and crosstool and buildtool and every other
approach I have seen for ongoing maintenance of embedded toolchains
and userlands) were obsoleted by a git supermodule.  Its submodules
would mostly track external projects, but would also logically contain
the fix-up patches worked out during platform integration, checked in
to branches anchored at each upstream release point.  The supermodule
would contain all of the build automation, log auditing, and remote
unit testing stuff, as well as the metadata for each submodule
involved in this platform build cycle.

At a content level, the sparsely populated / shallowly published
supermodule wouldn't be much different from today's PTXdist.  But the
pay-off comes when you merge forward to a new release of some base
component (compiler, library, etc.) and discover that some of your
fix-ups have been adopted or obsoleted upstream, and new fix-ups are
needed for components that depend on the updated bit, and the set of
configurables has changed (for which you need to compensate in the
meta-configurator).  Instead of piling up versioned patch directories,
you commit fix-ups to the sub-modules, which other integration
branches can ignore (if they aren't affected), merge, or cherry-pick.

As I understand it, in today's git, every content object is a patch to
the _data_ of one and only one git repository, containing the label of
the preceding _data_ state plus a diff of file contents and
attributes.  Assuming this model is retained, any clean state of a
"leaf" module (one with no submodules) can be reached by replaying a
series of patches, starting from the repository's root node (an empty
directory with the hopefully unique label generated by init-db).  The
label (SHA1) of the last patch is therefore a perfectly good label for
this _data_ state.

If all we were trying to do with supermodules was to capture and track
various states of the submodules' data, we could extend the format of
content objects to include "state X of submodule with init-db label
Y".  That would have the effect of capturing submodule states as
_data_ in non-"leaf" modules.  We would have to help cloners find a
place from which to pull these states, of course; and it's easy to get
sidetracked onto that part of the problem.  But that's not where the
bang for the buck is in supermodules.

The whole model of distributed supermodules, with references to
slightly diverging submodules whose content should mostly be fetched
from external sources, smells to me just like LVM.  The external
sources (like an LVM volume of which you have taken a "snapshot") make
up the bulk of the content pool.  They also give you a window into
developments on the submodule's own branches (like being able to peek
forward and merge changes from the original volume).  The supermodule
(the snapshot volume) provides most of the interesting refs (submodule
commits referenced by supermodule tags and branch heads), along with
enough "journaled" content to replay forward from some checkpoint
guaranteed to be available in each external source to any of these
refs.

The implication here is that submodule states are not just SHA1 labels
to be embedded within supermodule data diffs.  One ought to be able to
clone a supermodule without immediately cloning full copies of any of
its submodules.  This ought to populate the clone's content database
with all of the quanta of submodule content that aren't guaranteed to
be available from any not-too-stale submodule mirror.  When cloning,
you don't want to have to inspect every supermodule state for
submodule states that are outside the global subset.  So the
supermodule needs to maintain a set of supplemental refs from which
all referenced submodule states can be reached.  This allows you to
traverse the portion of the pool of submodule content that can't be
reached from true submodule branch heads.

On 12/1/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Yes, you do need to have a list of submodules somewhere, and you'd need to
> maintain that separately. One of the results of having the submodules be
> independent from the supermodule is that it's not all "automatically
> integrated", and thus the supermodule does end up having to have things
> like that maintained separately.

This is not a defect; it's a virtue.  It's important for every commit
to the supermodule to contain the information of which submodule
branches you're currently on and how far along them you've crawled.
Any particular supermodule commit point is likely to reflect an
integration milestone visible only to the person working at the
supermodule level.  No content object should ever cross a submodule
boundary, because then you wouldn't be able to apply it to the
submodule in isolation (or in another supermodule state) or identify
it when it is applied upstream and propagates back to you in a pull.
 But the supermodule can also contain supplemental refs (heads and
tags) that don't exist in the submodule (and shouldn't necessarily be
pushed to it); the commits they refer to are localized to the
submodule but may not be reachable from any of the submodule's branch
heads.

> And yes, if you screw that up, you wouldn't be able to fetch submodules
> properly etc, even if you see the supermodule, and yes, this sounds more
> like the CVS "Entries" kind of file that is more "tacked on" than really
> deeply integrated. But I think the separation is _more_ than worth the
> fact that you can see things being separate.

There is an opportunity for useful deep integration here.  The same
algorithm that does reachability analysis for "git prune" can dig from
supermodule down to submodules, copying objects into the supermodule
database until it hits a commit that is advertised as "global" by the
submodule.  "git clone" of the supermodule can then pull the bulk of
the submodules (a superset of the "global" subset) from (a mirror of)
the canonical place for each, and use the supermodule object database
as an alternate source for commits that don't exist in the "canonical"
submodule.

> In fact, I'm very much arguing for keeping things as separate as possible,
> while just integrating to the smallest possible degree (just _barely_
> enough that you can do things like "git clone" and it will fetch multiple
> repositories and put them all in the right places, and "git diff" and
> friends will do reasonably sane things).
>
> Keep it simple, stupid.

As simple as possible; but no simpler.  The "alternates" / "git clone
--reference" model is already almost powerful enough for the
supermodule to contain a "journal" of submodule commits that haven't
yet been retired to the canonical subset (guaranteed present in each
mirror).  The only difference is that the supermodule should be
considered a "weak alternates" source.  Commit objects in the
supermodule's database should be visible to submodule-level operations
(so that commits which are accepted upstream get flowed in nicely
during "git pull").

But if a commit becomes reachable from a ref that is really in the
submodule (not just one of the supermodule's "supplemental refs",
which should _not_ be visible to submodule operations), then it should
be copied into the submodule's object database.  (The refs internal to
the submodule should retain their integrity even if the supermodule is
inaccessible.)  The existing "strong alternates" mechanism should be
reserved for repos which are at least as public and persistent as the
submodule, and supermodules don't qualify (e. g., Linus's transmeta
scenario).

> On Sat, 2 Dec 2006, Josef Weidendorfer wrote:
> > The thing I wanted to discuss is whether such names would need to be globally
> > unique in the project containing submodles, or not.
>
> My preference would be for it to be "local", just because (as I
> mentioned), with mirroring etc, it might well be that you want to fetch
> things from the _closest_ repository. That's really not a global decision,
> it's a local one.

I think "global resource, local provider" is the way to go, with each
provider advertising what checkpoints of what resources it can supply.
 When I clone or pull, I should be able to consult a local mapping of
submodule URIs to "mirrors" (which may well be local repositories
containing content and branches that aren't in the "official"
upstream).  The only thing that may need "global" agreement is the
boundaries of the "global" subset for each submodule, i. e., the set
of commit objects that can reliably be obtained from any mirror of the
"official" upstream repository.  That doesn't need to be terribly
clever; "at least three days old on a globally published branch" would
probably be a perfectly good heuristic.

> > If yes, it IMHO makes a lot of sense to introduce "submodule objects" which contain
> > these submodule names, and which are used as pointers to submodule commits in
> > supermodule trees.
>
> You could do it that way, and then it would be global. It would work, and
> in many ways it would probably be "simpler" on a supermodule level.

I think the implication of "submodule objects" is that supermodule
diffs would say "roll submodule X from commit-id A to commit-id B".  I
don't think that would work very well for pulls/merges in the sparsely
populated scenario, because you want to be able to pull the
non-canonical subset of the individual diffs between states A and B
into the supermodule's object pool.  When you decide later to flesh
out submodule X, you should only have to clone some canonical mirror
and then fast-forward to state B using objects you already have in the
supermodule pool.

The merge case is even clearer.  Suppose I pull updates from two
remote branches of the supermodule onto my master branch.  Each remote
branch has added the same submodule, cloned from third-party
repositories whose clone history goes back to the same origin.  (The
example I have in mind is when some project switches to git from some
other SCM, and the maintainers of the remote branches port their
integration patches over from their git-svn tracker submodule to a
clone of upstream's new git repo.)  I should be able to postpone the
merge effort, come back later and clone the upstream repo, then merge
the non-canonical commits that were pulled earlier.

I might want to decide at supermodule pull time to postpone pulling
the bodies of the submodule commits; but I want the full sequence of
submodule commit IDs in the supermodule commit object.  So it's not so
much the supermodule _state_ that has a hierarchical structure; it's
the supermodule _diffs_ and _object_pool_ that become hierarchical.

> The advantage of a global namespace is that you can much more easily
> update it - "git fetch" will just fetch the new file(s) that describe the
> subprojects very naturally if they are all global. Putting them in a local
> .git/config file has it's advantages (see above), but it also makes it
> very hard to version them, and to update the list - it would have to
> become manual.

I think the only global-to-local-namespace mapping applies to the
different labels for the "empty repository" state generated at init-db
time.  Given the init-db SHA1 of the linux kernel repository, I should
be able to choose any mirror or clone of that repository as a source
for objects in its "global set".  I expect this provider not to
scribble on globally published branches, but that isn't even all that
critical; anything outside the canonical set is kept in the
supermodule's object pool, so I can always blow the submodule away and
regenerate it from a different mirror.

> There are possibly combinations of the two approaches: have a "global
> namespace" that describes the canonical place to get the subprojects, but
> have some way to add local "translation" of the canonical names into
> locally preferred versions (eg you could just have a way to say "this is
> the local mirror for that global canonical place")
>
> Maybe that would work?

Sure.  But all you really need from the canonical place is its init-db
SHA1 (permanent) and its list of globally published branches
(monotonically expanding).  A URL for it is a convenient shorthand but
doesn't have to be persistent.

Cheers,

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-03 17:54                                                                                           ` Linus Torvalds
@ 2006-12-04 20:26                                                                                             ` Torgil Svensson
  2006-12-04 20:41                                                                                               ` Linus Torvalds
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-04 20:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf-gmane, sf, git, Martin Waitz

On 12/3/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> > If you have an old irrelevant submodule in the history that happens to
> > have the same name as one of them you are interested in, do you get
> > this as well?
>
> It could be "solved" by simply having the requirement that all modules
> need to be named differently (notice that "module name" is _not_ the same
> thing as "the directory name where the module shows up".

Okay, missed that part.  I wasn't familiar with contents of the CVS
modules files and misinterpreted your suggestion.

MODULE [OPTIONS] [&OTHERMODULE...] [DIR] [FILES]

So all this is UI only and the "normal" operations on the supermodule

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-04 20:26                                                                                             ` Torgil Svensson
@ 2006-12-04 20:41                                                                                               ` Linus Torvalds
  2006-12-04 21:36                                                                                                 ` Torgil Svensson
  2006-12-05 10:38                                                                                                 ` Andreas Ericsson
  0 siblings, 2 replies; 252+ messages in thread
From: Linus Torvalds @ 2006-12-04 20:41 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: sf-gmane, sf, git, Martin Waitz

On Mon, 4 Dec 2006, Torgil Svensson wrote:
> 
> Okay, missed that part.  I wasn't familiar with contents of the CVS
> modules files and misinterpreted your suggestion.
> 
> MODULE [OPTIONS] [&OTHERMODULE...] [DIR] [FILES]
> 
> So all this is UI only and the "normal" operations on the supermodule
> will just ignore what's behind the commit-links?

Right. That's how CVS modules work (although in the case of CVS modules, 
the "dir" thing is obviously there in the "modules" file, so it's not 
_purely_ UI in CVS - this would likely be different in a git 
implementation, because the "tree" object ends up telling not just the 
exact version, but the location too).

So my suggestion basically boils down to:

 - "fetch" and "clone" etc will just look at the "modules" file, and 
   recursively fetch/clone whatever the module files talks about. This is 
   the "thin veneer to make it _look_ like git actually understands 
   submodules" part. It woudln't really - they're very much tacked on.

 - the tree entries are what makes the "once you have all the submodule 
   objects, this is how you can do 'diff' and 'checkout' on them, and this 
   is what tells you the exact version that goes along with a particular 
   supermodule version".

In other words, the simple and stupid way to do this is to just consider 
these two things two totally independent issues, and have different 
mechanisms for telling different operations what to do.

Is it "pretty"? No. The whole sub-module thing wouldn't be a tightly 
integrated low-level thing, it would very much be all about tracking 
multiple _separate_ git repositories, and just make them work well 
together. They'd very much still be separate, with just some simple 
infrastructure glue to make them look somewhat integrated.

So yeah, it's a bit hacky, but for the reasons I've tried to outline, I 
actually think that users _want_ hacky. Exactly because "deep integration" 
ends up having so many _bad_ features, so it's better to have a thin and 
simple layer that you can actually see past if you want to.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-04 20:41                                                                                               ` Linus Torvalds
@ 2006-12-04 21:36                                                                                                 ` Torgil Svensson
  2006-12-05 10:42                                                                                                   ` Andreas Ericsson
  2006-12-05 10:38                                                                                                 ` Andreas Ericsson
  1 sibling, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-04 21:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf-gmane, sf, git, Martin Waitz

On 12/4/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> So yeah, it's a bit hacky, but for the reasons I've tried to outline, I
> actually think that users _want_ hacky. Exactly because "deep integration"
> ends up having so many _bad_ features, so it's better to have a thin and
> simple layer that you can actually see past if you want to.

Thin and simple sounds very good. Let's try it with an example. Lets
say we have one apllication App1 and three librarys (Lib1, Lib2, Lib3)
with the following dependency-graph:

        App1
          /\
         /  \
   Lib1   Lib2
       \     /
        \   /
        Lib3 (don't really needed for this example but looks nice)

All components can be used individually and have their own upstream,
maintainer etc.

To compile App1 however, I need some files from both Lib1 and Lib2
specifying it's API. To satisfy these dependencies, It sounds
reasonable to link Lib2 and Lib3 submodules from App1. In your
concept, can I construct a modules file to fetch the API files and

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-04 18:56                                                                                       ` Michael K. Edwards
@ 2006-12-05  1:31                                                                                         ` Sam Vilain
  0 siblings, 0 replies; 252+ messages in thread
From: Sam Vilain @ 2006-12-05  1:31 UTC (permalink / raw)
  To: Michael K. Edwards
  Cc: Linus Torvalds, Josef Weidendorfer, sf, git, Martin Waitz

Michael K. Edwards wrote:
> who don't want to run kernel.org-scale mirrors.  To make this work,
> you need sparse repositories (conserving resources when fetching, by
> omitting the bulk of currently un-needed submodules that can reliably
> be obtained later from elsewhere) and shallow cloning (conserving
> resources when publishing, by referring cloners to a third-party
> repository for universally available content).

Did you see GitTorrent?  http://gittorrent.utsl.gen.nz/  A lot of
similar ideas to what you mention.  Sorry, still no prototype :)

I'd see the submodules thing as a good way to glue together a whole
bunch of repositories, so that the core mirror servers only have to
mirror a small-ish number of repositories.

Sam.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 19:41                                                                                       ` Linus Torvalds
  2006-12-03  9:19                                                                                         ` Torgil Svensson
  2006-12-03 19:33                                                                                         ` Andy Parkins
@ 2006-12-05  2:33                                                                                         ` Daniel Barkalow
  2006-12-05 22:07                                                                                           ` sf
  2006-12-09 21:34                                                                                         ` R. Steve McKown
  3 siblings, 1 reply; 252+ messages in thread
From: Daniel Barkalow @ 2006-12-05  2:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Torgil Svensson, sf-gmane, sf, git, Martin Waitz

On Sat, 2 Dec 2006, Linus Torvalds wrote:

> So that's where I come from. And maybe I'm totally wrong. I'd like to hear 
> what people who actually _use_ submodules think.

I think you'd rather hear from people who _would_ use submodules; I've 
worked on a number of projects that would have benefitted from that 
general functionality, but nobody trusted the implementation enough to 
actually use it.

At my work, we're doing a bunch of stuff with microcontrollers. We've got 
about a dozen different boards with microcontrollers, and each of them has 
different firmware. We also have a bunch of code that can go on any of the 
boards.

The way things are organized currently is that each board has its own 
project, and there's a "common-micro" project with the common code. This 
sort of works, but it means that when you change things in common-micro, 
you never know what effect this will have on boards other than the one 
you're actually working on. What I'd like to have is that each project has 
a "common-micro" subdirectory, and changes to each of these can be merged 
into each other, but that doesn't happen automaticly, and each board's 
revisions include the common-micro revision they were created with.

A few notes: 

I'd never work on common-micro in isolation. Nothing in there even 
compiles by itself, because the compiler needs to know the target 
microcontroller type, which depends on the board it's for. It only makes 
sense to prepare a new revision of "common-micro" in the context of some 
particular board, at least if you want to test it at all.

I'd sometimes want to include temporary hacks in the common-micro for a 
particular board, when things are late and I need to change some library 
behavior in a way that I know works for the board I'm working on, but I 
don't have time to think about all of the other boards (each of which is 
special in some way).

I'd often make some change that I know improves the cleanliness of 
common-micro, but which requires changes to every board to compensate. I 
don't want to make all of the changes at once; I'll update each board 
appropriately the next time I work on it. But, of course, until I update 
each board, that board needs to keep using the version of common-micro 
without the changes.

I don't want to have repository states where stuff doesn't work, which 
means I can't do it as one big tree; I need to be able to make a commit 
with board1 and a common-micro change without having a version of board2 
that would use the changed common-micro, because I haven't come up with a 
board2 version that works with it yet.

	-Daniel

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-11-30 18:57                                               ` Andreas Ericsson
  2006-12-01  8:49                                                 ` Andy Parkins
  2006-12-01 12:03                                                 ` sf
@ 2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
  2006-12-05 10:33                                                   ` Andreas Ericsson
  2006-12-05 16:00                                                   ` Sven Verdoolaege
  2 siblings, 2 replies; 252+ messages in thread
From: Uwe Kleine-Koenig @ 2006-12-05  9:01 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Martin Waitz, Andy Parkins, git

Hello,

Andreas Ericsson wrote:
> The only problem I'm seeing atm is that the supermodule somehow has to 
> mark whatever commits it's using from the submodule inside the submodule 
> repo so that they effectively become un-prunable, otherwise the 
> supermodule may some day find itself with a history that it can't restore.
One could circumvent that by creating a separate repo for the submodule
at checkout time and pull the needed objects in the supermodule's odb
when commiting the supermodule.  This way prune in the submodule cannot
do any harm, because in it's odb are no objects that are important for
the supermodule.

Uwe

-- 
Uwe Kleine-Koenig


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
@ 2006-12-05 10:33                                                   ` Andreas Ericsson
  2006-12-05 11:11                                                     ` Jakub Narebski
  2006-12-05 15:02                                                     ` Uwe Kleine-Koenig
  2006-12-05 16:00                                                   ` Sven Verdoolaege
  1 sibling, 2 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-05 10:33 UTC (permalink / raw)
  To: Uwe Kleine-Koenig, Andreas Ericsson, Martin Waitz, Andy Parkins,
	git

Uwe Kleine-Koenig wrote:
> Hello,
> 
> Andreas Ericsson wrote:
>> The only problem I'm seeing atm is that the supermodule somehow has to 
>> mark whatever commits it's using from the submodule inside the submodule 
>> repo so that they effectively become un-prunable, otherwise the 
>> supermodule may some day find itself with a history that it can't restore.
> One could circumvent that by creating a separate repo for the submodule
> at checkout time and pull the needed objects in the supermodule's odb
> when commiting the supermodule.  This way prune in the submodule cannot
> do any harm, because in it's odb are no objects that are important for
> the supermodule.
> 

Yes, but then you'd lose history connectivity (I'm assuming you'd only 
pull in the tree and blob objects from the submodule, and prefix the 
tree-entrys with whatever directory you're storing the submodul in).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-04 20:41                                                                                               ` Linus Torvalds
  2006-12-04 21:36                                                                                                 ` Torgil Svensson
@ 2006-12-05 10:38                                                                                                 ` Andreas Ericsson
  2006-12-05 11:01                                                                                                   ` Jakub Narebski
  1 sibling, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-05 10:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Torgil Svensson, sf-gmane, sf, git, Martin Waitz

Linus Torvalds wrote:
> 
> On Mon, 4 Dec 2006, Torgil Svensson wrote:
>> Okay, missed that part.  I wasn't familiar with contents of the CVS
>> modules files and misinterpreted your suggestion.
>>
>> MODULE [OPTIONS] [&OTHERMODULE...] [DIR] [FILES]
>>
>> So all this is UI only and the "normal" operations on the supermodule
>> will just ignore what's behind the commit-links?
> 
> Right. That's how CVS modules work (although in the case of CVS modules, 
> the "dir" thing is obviously there in the "modules" file, so it's not 
> _purely_ UI in CVS - this would likely be different in a git 
> implementation, because the "tree" object ends up telling not just the 
> exact version, but the location too).
> 
> So my suggestion basically boils down to:
> 
>  - "fetch" and "clone" etc will just look at the "modules" file, and 
>    recursively fetch/clone whatever the module files talks about. This is 
>    the "thin veneer to make it _look_ like git actually understands 
>    submodules" part. It woudln't really - they're very much tacked on.
> 
>  - the tree entries are what makes the "once you have all the submodule 
>    objects, this is how you can do 'diff' and 'checkout' on them, and this 
>    is what tells you the exact version that goes along with a particular 
>    supermodule version".
> 
> In other words, the simple and stupid way to do this is to just consider 
> these two things two totally independent issues, and have different 
> mechanisms for telling different operations what to do.
> 
> Is it "pretty"? No. The whole sub-module thing wouldn't be a tightly 
> integrated low-level thing, it would very much be all about tracking 
> multiple _separate_ git repositories, and just make them work well 
> together. They'd very much still be separate, with just some simple 
> infrastructure glue to make them look somewhat integrated.
> 
> So yeah, it's a bit hacky, but for the reasons I've tried to outline, I 
> actually think that users _want_ hacky. Exactly because "deep integration" 
> ends up having so many _bad_ features, so it's better to have a thin and 
> simple layer that you can actually see past if you want to.
> 

Indeed. With the "tight" integration option we'd also have to have the 
mechanism to rewrite the tree-entries with the location where the 
submodule is located in the working tree. This might be needed anyways, 
but it sure as hell seems a lot easier to just tack that part on when 
doing a checkout and actually creating all the files.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-04 21:36                                                                                                 ` Torgil Svensson
@ 2006-12-05 10:42                                                                                                   ` Andreas Ericsson
  2006-12-05 11:09                                                                                                     ` Jakub Narebski
  0 siblings, 1 reply; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-05 10:42 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: Linus Torvalds, sf-gmane, sf, git, Martin Waitz

Torgil Svensson wrote:
> On 12/4/06, Linus Torvalds <torvalds@osdl.org> wrote:
>>
>> So yeah, it's a bit hacky, but for the reasons I've tried to outline, I
>> actually think that users _want_ hacky. Exactly because "deep 
>> integration"
>> ends up having so many _bad_ features, so it's better to have a thin and
>> simple layer that you can actually see past if you want to.
> 
> Thin and simple sounds very good. Let's try it with an example. Lets
> say we have one apllication App1 and three librarys (Lib1, Lib2, Lib3)
> with the following dependency-graph:
> 
>        App1
>          /\
>         /  \
>   Lib1   Lib2
>       \     /
>        \   /
>        Lib3 (don't really needed for this example but looks nice)
> 
> All components can be used individually and have their own upstream,
> maintainer etc.
> 
> To compile App1 however, I need some files from both Lib1 and Lib2
> specifying it's API. To satisfy these dependencies, It sounds
> reasonable to link Lib2 and Lib3 submodules from App1. In your
> concept, can I construct a modules file to fetch the API files and
> their history without checking out the whole Lib1 and Lib2 source?

I think not. Then it wouldn't be a submodule anymore, but just some 
random sources from an upstream project. Not that it's an uncommon 
workflow or anything, but it's sort of akin to just importing the SHA1 
implementation (a few source-files with no real interest in the history 
of those source-files) from openssl into a different project rather than 
actually using the entire openssl lib (which would be nice to have as a 
submodule).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05 10:38                                                                                                 ` Andreas Ericsson
@ 2006-12-05 11:01                                                                                                   ` Jakub Narebski
  0 siblings, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-05 11:01 UTC (permalink / raw)
  To: git

Andreas Ericsson wrote:

> Indeed. With the "tight" integration option we'd also have to have the 
> mechanism to rewrite the tree-entries with the location where the 
> submodule is located in the working tree. This might be needed anyways, 
> but it sure as hell seems a lot easier to just tack that part on when 
> doing a checkout and actually creating all the files.

Excellent idea! This way most of the concerns for "separate repositories for
submodules" layout about ability to rename directory the submodule resides
in, or move submodule are resolved. The other part would be to use
submodule-aware git-mv to move submodule(s).

Perhaps the following solution would work best:
 * refs/submodules/<module> holds sha1 of top commit in submodule
 * objects/info/submodules is a file which can be automatically generated
   (or at least automatically updated) on checkout, with the following
   contents:

   <module> TAB or SPC <path to submodule, or GIT_DIR of submodule, or
                        GIT_OBJECT_DIRECTORY of submodule>

   with the usual rule that # and ; means comment, \ at end of line is used
   for continuations, empty lines doesn't matter etc.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05 10:42                                                                                                   ` Andreas Ericsson
@ 2006-12-05 11:09                                                                                                     ` Jakub Narebski
  0 siblings, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-05 11:09 UTC (permalink / raw)
  To: git

Andreas Ericsson wrote:

> Torgil Svensson wrote:
>> On 12/4/06, Linus Torvalds <torvalds@osdl.org> wrote:
>>>
>>> So yeah, it's a bit hacky, but for the reasons I've tried to outline, I
>>> actually think that users _want_ hacky. Exactly because "deep 
>>> integration"
>>> ends up having so many _bad_ features, so it's better to have a thin and
>>> simple layer that you can actually see past if you want to.
>> 
>> Thin and simple sounds very good. Let's try it with an example. Lets
>> say we have one apllication App1 and three librarys (Lib1, Lib2, Lib3)
>> with the following dependency-graph:
>> 
>>        App1
>>        /  \
>>       /    \
>>   Lib1   Lib2
>>       \    /
>>        \  /
>>        Lib3 (don't really needed for this example but looks nice)
>> 
>> All components can be used individually and have their own upstream,
>> maintainer etc.
>> 
>> To compile App1 however, I need some files from both Lib1 and Lib2
>> specifying it's API. To satisfy these dependencies, It sounds
>> reasonable to link Lib2 and Lib3 submodules from App1. In your
>> concept, can I construct a modules file to fetch the API files and
>> their history without checking out the whole Lib1 and Lib2 source?
> 
> I think not. Then it wouldn't be a submodule anymore, but just some 
> random sources from an upstream project. Not that it's an uncommon 
> workflow or anything, but it's sort of akin to just importing the SHA1 
> implementation (a few source-files with no real interest in the history 
> of those source-files) from openssl into a different project rather than 
> actually using the entire openssl lib (which would be nice to have as a 
> submodule).

Note that this is what partial checkouts (another great idea nobody
implemented yet[*1*]; you can do partial checkout but there is no UI for
this, and working with partial checkouts is bit hard) is about, although it
would buy you only working area space, and not repository (object database
storage) space.

For now, you can imitate this by having in in Lib1 and Lib2 the 'includes'
branch which would contain only the API (and which you would have to keep
up to date with 'master', but it should be fairly easy: just merge changes
into 'includes', perhaps with help of git-rerere, or [nonexisting]
git-rerere2).

[*1*] Although with our track[*2*] I guess it is reasonable to think it
would get implemented soon.
[*2*] Out of four "great ideas": shallow clone / sparse clone, submodules
support, lazy clone / remote alternates, two are in example-implementation
(submodules support) and beta work (shallow clone is in 'next').

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05 10:33                                                   ` Andreas Ericsson
@ 2006-12-05 11:11                                                     ` Jakub Narebski
  2006-12-05 15:02                                                     ` Uwe Kleine-Koenig
  1 sibling, 0 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-05 11:11 UTC (permalink / raw)
  To: git

Andreas Ericsson wrote:

> Uwe Kleine-Koenig wrote:
>
>> Andreas Ericsson wrote:
>>> The only problem I'm seeing atm is that the supermodule somehow has to 
>>> mark whatever commits it's using from the submodule inside the submodule 
>>> repo so that they effectively become un-prunable, otherwise the 
>>> supermodule may some day find itself with a history that it can't restore.
>>>
>> One could circumvent that by creating a separate repo for the submodule
>> at checkout time and pull the needed objects in the supermodule's odb
>> when commiting the supermodule.  This way prune in the submodule cannot
>> do any harm, because in it's odb are no objects that are important for
>> the supermodule.
> 
> Yes, but then you'd lose history connectivity (I'm assuming you'd only 
> pull in the tree and blob objects from the submodule, and prefix the 
> tree-entrys with whatever directory you're storing the submodul in).

I thought that Uwe meant pulling (getting) _all_ the needed objects from
submodule object repository into supermodule object repository: commits,
trees and blobs, full history.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05 10:33                                                   ` Andreas Ericsson
  2006-12-05 11:11                                                     ` Jakub Narebski
@ 2006-12-05 15:02                                                     ` Uwe Kleine-Koenig
  2006-12-05 15:30                                                       ` Andreas Ericsson
  1 sibling, 1 reply; 252+ messages in thread
From: Uwe Kleine-Koenig @ 2006-12-05 15:02 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Martin Waitz, Andy Parkins, git

Hella Andreas,

Andreas Ericsson wrote:
> >>The only problem I'm seeing atm is that the supermodule somehow has to 
> >>mark whatever commits it's using from the submodule inside the submodule 
> >>repo so that they effectively become un-prunable, otherwise the 
> >>supermodule may some day find itself with a history that it can't restore.
> >One could circumvent that by creating a separate repo for the submodule
> >at checkout time and pull the needed objects in the supermodule's odb
> >when commiting the supermodule.  This way prune in the submodule cannot
> >do any harm, because in it's odb are no objects that are important for
> >the supermodule.
> 
> Yes, but then you'd lose history connectivity (I'm assuming you'd only 
> pull in the tree and blob objects from the submodule, and prefix the 
> tree-entrys with whatever directory you're storing the submodul in).
That's the reason for me prefering to pull in the complete commit.

I don't understand what you mean with "prefix the tree-entrys with
whatever directory you're storing the submodul in".
Maybe one of us doesn't understand tree objects correctly.  AFAICT they
don't store the location where they occur, so there is no need to store
a prefix.  E.g. 

	zeisberg@cepheus:/tmp$ mkdir test-repo
	zeisberg@cepheus:/tmp$ cd test-repo/
	zeisberg@cepheus:/tmp/test-repo$ git-init-db 
	defaulting to local storage area
	zeisberg@cepheus:/tmp/test-repo$ echo LD_FLAGS=-ltest > Makefile
	zeisberg@cepheus:/tmp/test-repo$ git add Makefile
	zeisberg@cepheus:/tmp/test-repo$ git commit -m 'test1'
	Committing initial tree 754eadab39642175748bb02155d2959176bcf014
	zeisberg@cepheus:/tmp/test-repo$ mkdir subdir
	zeisberg@cepheus:/tmp/test-repo$ cp Makefile subdir/
	zeisberg@cepheus:/tmp/test-repo$ git add subdir/
	zeisberg@cepheus:/tmp/test-repo$ git commit -m 'test2'
	zeisberg@cepheus:/tmp/test-repo$ git ls-tree HEAD
	100644 blob 610bafd79f92c7e546b104d5b22795df1f099723    Makefile
	040000 tree 754eadab39642175748bb02155d2959176bcf014    subdir

So the tree that only contains the Makefile specifing LD_FLAGS has the
sha1id 754eadab39642175748bb02155d2959176bcf014 independent of being the
root of my project or a subtree.

But maybe I misunderstood you?

Best regards
Uwe

-- 
Uwe Kleine-Koenig

If a lawyer and an IRS agent were both drowning, and you could only save

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05 15:02                                                     ` Uwe Kleine-Koenig
@ 2006-12-05 15:30                                                       ` Andreas Ericsson
  0 siblings, 0 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-05 15:30 UTC (permalink / raw)
  To: Uwe Kleine-Koenig, Andreas Ericsson, Martin Waitz, Andy Parkins,
	git

Uwe Kleine-Koenig wrote:
> Hella Andreas,
> 
> Andreas Ericsson wrote:
>>>> The only problem I'm seeing atm is that the supermodule somehow has to 
>>>> mark whatever commits it's using from the submodule inside the submodule 
>>>> repo so that they effectively become un-prunable, otherwise the 
>>>> supermodule may some day find itself with a history that it can't restore.
>>> One could circumvent that by creating a separate repo for the submodule
>>> at checkout time and pull the needed objects in the supermodule's odb
>>> when commiting the supermodule.  This way prune in the submodule cannot
>>> do any harm, because in it's odb are no objects that are important for
>>> the supermodule.
>> Yes, but then you'd lose history connectivity (I'm assuming you'd only 
>> pull in the tree and blob objects from the submodule, and prefix the 
>> tree-entrys with whatever directory you're storing the submodul in).
> That's the reason for me prefering to pull in the complete commit.
> 
> I don't understand what you mean with "prefix the tree-entrys with
> whatever directory you're storing the submodul in".
> Maybe one of us doesn't understand tree objects correctly.  AFAICT they
> don't store the location where they occur, so there is no need to store
> a prefix.  E.g. 
> 
> 	100644 blob 610bafd79f92c7e546b104d5b22795df1f099723    Makefile
> 	040000 tree 754eadab39642175748bb02155d2959176bcf014    subdir
> 
> So the tree that only contains the Makefile specifing LD_FLAGS has the
> sha1id 754eadab39642175748bb02155d2959176bcf014 independent of being the
> root of my project or a subtree.
> 
> But maybe I misunderstood you?
> 

Nopes. I just didn't think of the fact that subtrees are trees and never 
store any path-info no matter what. So basically the supermodule can 
store all trees of all submodules for each commit adding a new submodule 
revision (which is neat, since "casuals" never have to bother with 
getting all the submodules if they want to see all the code used in any 
particular revision), while we invent the new tree object "subm" that 
points to a commit in the submodule repo. We then teach the tools to 
recognize when the *real* submodule repo is present and just don't check 
out trees from the supermodule odb that lead us to directories where 
submodules reside. Simple and beautiful. Me likes.

*IF* we teach the history viewers about submodules is a different matter 
though. I'm not sure it would make much sense to have simple text-mode 
browsers show the submodule history, although I can imagine qgit and 
gitk wanting to take advantage of their nice side-by-side DAG displaying 
code to show all the repos in parallell, or link between them in some 
point-and-click kind of way.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
  2006-12-05 10:33                                                   ` Andreas Ericsson
@ 2006-12-05 16:00                                                   ` Sven Verdoolaege
  1 sibling, 0 replies; 252+ messages in thread
From: Sven Verdoolaege @ 2006-12-05 16:00 UTC (permalink / raw)
  To: Uwe Kleine-Koenig, Andreas Ericsson, Martin Waitz, Andy Parkins,
	git

On Tue, Dec 05, 2006 at 10:01:25AM +0100, Uwe Kleine-Koenig wrote:
> Hello,
> 
> Andreas Ericsson wrote:
> > The only problem I'm seeing atm is that the supermodule somehow has to 
> > mark whatever commits it's using from the submodule inside the submodule 
> > repo so that they effectively become un-prunable, otherwise the 
> > supermodule may some day find itself with a history that it can't restore.
> One could circumvent that by creating a separate repo for the submodule
> at checkout time and pull the needed objects in the supermodule's odb
> when commiting the supermodule.  This way prune in the submodule cannot
> do any harm, because in it's odb are no objects that are important for
> the supermodule.

I _think_ Linus argued against doing this (for scalability reasons),
although he didn't actually answer my question when I asked him directly.
In his proposal you wouldn't need to do this, because the particular
checked-out copy of the submodule that is located in a subdirectory
of a superproject would not be allowed to be pruned and it seems that
Martin has also implemented it like this.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-05  2:33                                                                                         ` Daniel Barkalow
@ 2006-12-05 22:07                                                                                           ` sf
  0 siblings, 0 replies; 252+ messages in thread
From: sf @ 2006-12-05 22:07 UTC (permalink / raw)
  To: git

Daniel Barkalow wrote:
> On Sat, 2 Dec 2006, Linus Torvalds wrote:
> 
>> So that's where I come from. And maybe I'm totally wrong. I'd like to hear 
>> what people who actually _use_ submodules think.
> 
> I think you'd rather hear from people who _would_ use submodules; I've 
> worked on a number of projects that would have benefitted from that 
> general functionality, but nobody trusted the implementation enough to 
> actually use it.
> 
> At my work, we're doing a bunch of stuff with microcontrollers. We've got 
> about a dozen different boards with microcontrollers, and each of them has 
> different firmware. We also have a bunch of code that can go on any of the 
> boards.
> 
> The way things are organized currently is that each board has its own 
> project, and there's a "common-micro" project with the common code. This 
> sort of works, but it means that when you change things in common-micro, 
> you never know what effect this will have on boards other than the one 
> you're actually working on. What I'd like to have is that each project has 
> a "common-micro" subdirectory, and changes to each of these can be merged 
> into each other, but that doesn't happen automaticly, and each board's 
> revisions include the common-micro revision they were created with.

Our setup and requirements at work are exactly the same: We have a few
main projects that are developed independently and we have one "helper"
project for code that is general enough to be reused. So work on the
helper project is only done while working on one of the main projects.
When we switch to another main project we integrate the changes to the
"helper" project.

That's the theory, at least.

Regards

Stephan

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-01 20:13                                                                         ` Linus Torvalds
                                                                                             ` (2 preceding siblings ...)
  2006-12-01 22:35                                                                           ` sf
@ 2006-12-08 18:29                                                                           ` Jon Loeliger
  2006-12-08 18:45                                                                             ` Sven Verdoolaege
  2006-12-12  8:32                                                                             ` Andreas Ericsson
  3 siblings, 2 replies; 252+ messages in thread
From: Jon Loeliger @ 2006-12-08 18:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sf, Git List, Martin Waitz

On Fri, 2006-12-01 at 14:13, Linus Torvalds wrote:

> So this is why it's really important that the submodule really is a git 
> repository in its own right, and why committing stuff in the supermodule 
> NEVER affect the submodule itself directly (it might _cause_ you to also 
> do a commit in the submodule indirectly, but the submodule commit MUST be 
> totally independent, and stand on its own).

An implication of this is that the entire administrative
responsibility for having some super-sub module interaction
lies entirely with the supermodule.

Why not have a "glue" object at the "stub"-interface of
the supermodule tree that provides policy mappings to
the sub-modules.  Perhaps indicating git URL location,
mappings of branch names between super- and sub- modules,
special commit SHA1s, user policy or config choices at
the boundary, and things like that.

Is that the sort of direction we are headed?

jdl

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-08 18:29                                                                           ` Jon Loeliger
@ 2006-12-08 18:45                                                                             ` Sven Verdoolaege
  2006-12-12  8:32                                                                             ` Andreas Ericsson
  1 sibling, 0 replies; 252+ messages in thread
From: Sven Verdoolaege @ 2006-12-08 18:45 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: Linus Torvalds, sf, Git List, Martin Waitz

On Fri, Dec 08, 2006 at 12:29:14PM -0600, Jon Loeliger wrote:
> Why not have a "glue" object at the "stub"-interface of
> the supermodule tree that provides policy mappings to
> the sub-modules.  Perhaps indicating git URL location,
> mappings of branch names between super- and sub- modules,
> special commit SHA1s, user policy or config choices at
> the boundary, and things like that.
> 
> Is that the sort of direction we are headed?

Not unless you have something useful in mind that could be put in
these glue objects.  URLs and branch names, in particular, should
not be stored in the repository itself, but in configuration files,
since they will be different for different copies of the repo.


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-02 19:41                                                                                       ` Linus Torvalds
                                                                                                           ` (2 preceding siblings ...)
  2006-12-05  2:33                                                                                         ` Daniel Barkalow
@ 2006-12-09 21:34                                                                                         ` R. Steve McKown
  2006-12-10 11:47                                                                                           ` Torgil Svensson
  3 siblings, 1 reply; 252+ messages in thread
From: R. Steve McKown @ 2006-12-09 21:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Saturday 02 December 2006 12:41 pm, Linus Torvalds wrote:
> In other words, I _suspect_ that that is really what module users are all
> about. They want the ability to specify an arbitrary collection of these
> atomic snapshots (for releases etc), and just want a way to copy and move
> those things around, and are less interested in making everything else
> very seamless (because most people are happy to do the actual
> _development_ entirely within the submodules, so the "development" part
> is actually not that important for the supermodule, the supermodule is
> mostly for aggregation and snapshots, and tying different versions of
> different submodules together).
>
> So that's where I come from. And maybe I'm totally wrong. I'd like to hear
> what people who actually _use_ submodules think.

Here's some thoughts on subprojects from my company's perspective.  I 
apologize for the long message.

Abstract: We use submodules heavily in CVS and SVN.  I like what I've read 
from Linus about the "thin veneer" approach of integrating subprojects.  It 
seems conceptually to provide the support we desire.  For us, it's important 
that the mandated linkage between a master project and a subproject is 
minimal to maximize our flexibility in building our processes.


We develop and maintain a lot of embedded applications.  Both for higher level 
systems (ex: 32MB RAM/32MB storage) running the Linux kernel and a customized 
set of libs/app support code and more deeply embedded environments (ex: 8KB 
of RAM and 32KB of storage).  Even though these two cases are very different 
in many repects, the version management issues are the same.

- We (mostly) track everything needed to build historical versions of code 
with 100% fidelity.  This includes all of the tools used to compile, build, 
test, deploy, debug, etc. the actual build results themselves.  I initially 
looked at Vesta several years ago.  I love their conceptual approach to this 
problem (integrated build system that caches mid-level build results within 
the repository itself), but it's too unwieldy, very hard to set up (lots of 
up-front effort), and lacks many useful features.

- Most of our "applications" are a relatively small amount of app-specific 
code with references to several/many shared modules.  Shared modules can 
contain support tools, like build/test/debug/deploy support for a given 
embedded platform, in-house developed shared app code, or shared code 
developed by third parties.

- We use CVS to manage our larger system development projects.  The repo is 
about 2GB and has several dozen application-code submodules.  We use the 
"third party sources" approach to tracking submodules as outlined in Ch.13 of 
the CVS manual.  Additionally, we manage our "buildox" (similar to buildroot 
in concept) in another CVS repo.  All prior interesting versions of the 
buildroot can be built from source (toolchains, everything), if necessary.  
Applications contain metadata (a file...) in the repo so the app-level build 
system can ensure it is being ran under the correct version of buildbox; 
clunky but serviceable.  CVS is a nightmare because of its poor 
branch/tagging facilities, and many of the things we *ought* to be doing with 
revision control we don't because of the complexity.

- We use SVN to manage our deeply embedded system projects.  The repo is about 
250MB in size.  Applications use the svn:externals property to reference 
needed modules.  We aren't using a buildbox in this environment yet (bad!).  
SVN's simple branching and svn:externals are a giant leap forward in 
comparison to CVS's capabilities.


Below are some common use case scenarios that are to varying degrees unweildy 
in CVS and/or SVN.  Many of these involving non-trivial branching and merging 
operations are nearly impractical in CVS, and the lack of merge tracking (to 
support repeated safe merging from one branch to another) makes some of these 
a bit tricky in SVN too.  Of course neither repo supports 
disconnected/distributed operation, which would make a number of activities 
that much simpler as well.

- Round trip module management.  A specific app requires a change to a shared 
module, so it makes a local branch to develop the change.  The "diff" is 
presented to the maintainer (who may be inhouse).  The next interesting 
maintainer version of the module gets imported into our repo (if in house, 
it's already there), where the app can reference it.  This merge process may 
leave changes not yet implemented (or never to be implemented) by the module 
maintainer in the local branch used by the apps.  Other apps are unaffected, 
as they are linking to a prior version in the local branch.

- Pragmatic development.  It's typical that in developing an application, a 
developer will need to simultaneously make changes to one or more submodules.  
If more than trivial, he/she should branch the submodules and continually 
tracking the HEAD of those branches in the relevant app.  This is so complex 
and fraught with problems in CVS that it doesn't get done, and developers 
house too much change over time in their working directories.  With SVN and 
svn:externals, the process is workable.  It is nice that an svn:external can 
point to (the HEAD of) a branch when making changes.

- An application implements a new feature internally (say support for a new 
digital chipset in the embedded world) which later needs to be "promoted" to 
a subproject for use by others.  Pretty easy in SVN.  A challenge in CVS; 
it's really not possible to "convert" app code into a "third party source" 
and retain an historical link.

- Updating build tools.  In concept no different than updating a shared code 
module.  In practice, due to the buildbox strategy, it's a bit convoluted.  I 
don't expect this to get much smoother.  Getting Vesta-like features, where 
integrated build suport can cache lower-level build results in a version-safe 
manner (like the binary code built when the cross toolchain was built) would 
be killer, but that's surely OT for the submodules discussion.

Thanks,

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-09 21:34                                                                                         ` R. Steve McKown
@ 2006-12-10 11:47                                                                                           ` Torgil Svensson
  2006-12-14 21:27                                                                                             ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-10 11:47 UTC (permalink / raw)
  To: R. Steve McKown; +Cc: Linus Torvalds, git

What if we use linus "module" file concept and allow the link objects
to track subtrees? An object may look like this:

commit: <SHA1>
link: <SHA1> /path/to/remote/tree/or/blob


Tracking upstream library:
--------------------------
clone as usual


Inhouse libraries/applications:
-------------------------------
To satisfy versioning of build-dependencies - make links of type
"external/lib1_header.h" -> "<commit>/headers/lib1_header.h" (blob)
"external/lib1_interface" -> "<commit>/api" (tree)

If git supports "sparse fetching" of subtrees we can follow the
history in the submodule only concerning the files we want without
fetching the whole subtree. "modules" file could specify something
like "always clone on fetch"


Build environment
------------------
First make links to all tools, applications, etc ...
"buildtools/random_app1" -> "<commit>/"
"buildtools/random_app2" -> "<commit>/"
"sub_build_projects/user_interface" -> "<commit>/"
"sub_build_projects/kernel" -> "<commit>/"
"apps/special_app1" -> "<commit>/"
"libs/special_lib1" ->
"<commit-from-another-build-project>/special/lib/binary/path"

Here we can have a build system that for example creates a "i386"
folder and the repo itself


Documentation release
----------------------
"Lib1/" -> "<lib1 commit>/docs"
"Lib2/" -> "<lib2 commit>/docs"
"App1/" -> "<app1 commit>/docs"


Special customer release for a specific HW platform
---------------------------------------------------
"Lib1/lib1.h" -> "<lib1-commit>/headers/lib1.h"
"Lib1/lib1.so" -> "<build-environment-commit>/i386/Lib1/lib1.so"
"Lib1/docs" -> "<lib1-commit>/docs"
"App1_binary" -> "<build-environment-commit>/i386/App1/App1_binary"
"docs" -> "<app1-commit>/docs"

commit&tag&bag this and send to customer. If the customer says
something is broken, we can make an SHA1 of the customers tree and
immediately see if there's changes not belonging to us.


Now this can be broken in so many ways that I can't even count, so I
appreciate some feedback to correct my head.


On 12/9/06, R. Steve McKown <rsmckown@yahoo.com> wrote:
> On Saturday 02 December 2006 12:41 pm, Linus Torvalds wrote:
> > In other words, I _suspect_ that that is really what module users are all
> > about. They want the ability to specify an arbitrary collection of these
> > atomic snapshots (for releases etc), and just want a way to copy and move
> > those things around, and are less interested in making everything else
> > very seamless (because most people are happy to do the actual
> > _development_ entirely within the submodules, so the "development" part
> > is actually not that important for the supermodule, the supermodule is
> > mostly for aggregation and snapshots, and tying different versions of
> > different submodules together).
> >
> > So that's where I come from. And maybe I'm totally wrong. I'd like to hear
> > what people who actually _use_ submodules think.
>
> Here's some thoughts on subprojects from my company's perspective.  I
> apologize for the long message.
>
> Abstract: We use submodules heavily in CVS and SVN.  I like what I've read
> from Linus about the "thin veneer" approach of integrating subprojects.  It
> seems conceptually to provide the support we desire.  For us, it's important
> that the mandated linkage between a master project and a subproject is
> minimal to maximize our flexibility in building our processes.
>
>
> We develop and maintain a lot of embedded applications.  Both for higher level
> systems (ex: 32MB RAM/32MB storage) running the Linux kernel and a customized
> set of libs/app support code and more deeply embedded environments (ex: 8KB
> of RAM and 32KB of storage).  Even though these two cases are very different
> in many repects, the version management issues are the same.
>
> - We (mostly) track everything needed to build historical versions of code
> with 100% fidelity.  This includes all of the tools used to compile, build,
> test, deploy, debug, etc. the actual build results themselves.  I initially
> looked at Vesta several years ago.  I love their conceptual approach to this
> problem (integrated build system that caches mid-level build results within
> the repository itself), but it's too unwieldy, very hard to set up (lots of
> up-front effort), and lacks many useful features.
>
> - Most of our "applications" are a relatively small amount of app-specific
> code with references to several/many shared modules.  Shared modules can
> contain support tools, like build/test/debug/deploy support for a given
> embedded platform, in-house developed shared app code, or shared code
> developed by third parties.
>
> - We use CVS to manage our larger system development projects.  The repo is
> about 2GB and has several dozen application-code submodules.  We use the
> "third party sources" approach to tracking submodules as outlined in Ch.13 of
> the CVS manual.  Additionally, we manage our "buildox" (similar to buildroot
> in concept) in another CVS repo.  All prior interesting versions of the
> buildroot can be built from source (toolchains, everything), if necessary.
> Applications contain metadata (a file...) in the repo so the app-level build
> system can ensure it is being ran under the correct version of buildbox;
> clunky but serviceable.  CVS is a nightmare because of its poor
> branch/tagging facilities, and many of the things we *ought* to be doing with
> revision control we don't because of the complexity.
>
> - We use SVN to manage our deeply embedded system projects.  The repo is about
> 250MB in size.  Applications use the svn:externals property to reference
> needed modules.  We aren't using a buildbox in this environment yet (bad!).
> SVN's simple branching and svn:externals are a giant leap forward in
> comparison to CVS's capabilities.
>
>
> Below are some common use case scenarios that are to varying degrees unweildy
> in CVS and/or SVN.  Many of these involving non-trivial branching and merging
> operations are nearly impractical in CVS, and the lack of merge tracking (to
> support repeated safe merging from one branch to another) makes some of these
> a bit tricky in SVN too.  Of course neither repo supports
> disconnected/distributed operation, which would make a number of activities
> that much simpler as well.
>
> - Round trip module management.  A specific app requires a change to a shared
> module, so it makes a local branch to develop the change.  The "diff" is
> presented to the maintainer (who may be inhouse).  The next interesting
> maintainer version of the module gets imported into our repo (if in house,
> it's already there), where the app can reference it.  This merge process may
> leave changes not yet implemented (or never to be implemented) by the module
> maintainer in the local branch used by the apps.  Other apps are unaffected,
> as they are linking to a prior version in the local branch.
>
> - Pragmatic development.  It's typical that in developing an application, a
> developer will need to simultaneously make changes to one or more submodules.
> If more than trivial, he/she should branch the submodules and continually
> tracking the HEAD of those branches in the relevant app.  This is so complex
> and fraught with problems in CVS that it doesn't get done, and developers
> house too much change over time in their working directories.  With SVN and
> svn:externals, the process is workable.  It is nice that an svn:external can
> point to (the HEAD of) a branch when making changes.
>
> - An application implements a new feature internally (say support for a new
> digital chipset in the embedded world) which later needs to be "promoted" to
> a subproject for use by others.  Pretty easy in SVN.  A challenge in CVS;
> it's really not possible to "convert" app code into a "third party source"
> and retain an historical link.
>
> - Updating build tools.  In concept no different than updating a shared code
> module.  In practice, due to the buildbox strategy, it's a bit convoluted.  I
> don't expect this to get much smoother.  Getting Vesta-like features, where
> integrated build suport can cache lower-level build results in a version-safe
> manner (like the binary code built when the cross toolchain was built) would
> be killer, but that's surely OT for the submodules discussion.
>
> Thanks,
> Steve
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-08 18:29                                                                           ` Jon Loeliger
  2006-12-08 18:45                                                                             ` Sven Verdoolaege
@ 2006-12-12  8:32                                                                             ` Andreas Ericsson
  1 sibling, 0 replies; 252+ messages in thread
From: Andreas Ericsson @ 2006-12-12  8:32 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: Linus Torvalds, sf, Git List, Martin Waitz

Jon Loeliger wrote:
> On Fri, 2006-12-01 at 14:13, Linus Torvalds wrote:
> 
>> So this is why it's really important that the submodule really is a git 
>> repository in its own right, and why committing stuff in the supermodule 
>> NEVER affect the submodule itself directly (it might _cause_ you to also 
>> do a commit in the submodule indirectly, but the submodule commit MUST be 
>> totally independent, and stand on its own).
> 
> An implication of this is that the entire administrative
> responsibility for having some super-sub module interaction
> lies entirely with the supermodule.
> 

That's a good thing. I wouldn't want the openssl maintainers to have to 
bother with every project that uses their code, and I'm fairly certain 
they feel the same.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-10 11:47                                                                                           ` Torgil Svensson
@ 2006-12-14 21:27                                                                                             ` Torgil Svensson
  2006-12-14 23:07                                                                                               ` Josef Weidendorfer
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-14 21:27 UTC (permalink / raw)
  To: R. Steve McKown; +Cc: Linus Torvalds, git

On 12/10/06, Torgil Svensson <torgil.svensson@gmail.com> wrote:
> What if we use linus "module" file concept and allow the link objects
> to track subtrees? An object may look like this:
>
> commit: <SHA1>
> link: <SHA1> /path/to/remote/tree/or/blob

> Special customer release for a specific HW platform
> ---------------------------------------------------
> "Lib1/lib1.h" -> "<lib1-commit>/headers/lib1.h"
> "Lib1/lib1.so" -> "<build-environment-commit>/i386/Lib1/lib1.so"
> "App1_binary" -> "<build-environment-commit>/i386/App1/App1_binary"

This example is somewhat complex since the build for lib1.so and the
header-file might not has gone through the same commit on the lib1
subproject.  Consider this example:


lib1 - library project (source tracking)
------------------------------------------
Blob: /src/lib1.h


app1 - application project (source tracking)
-----------------------------------------
Link: /headers/lib1.h -> <lib1-commit1>/src/lib1.h


build1 - Build project (binary build tracking)
------------------------------------
Link: /src/lib1 -> <lib1-commit2>/
Link: /src/app1 -> <app1-commit>/
Blob: /i386/lib1/lib1.so
Blob: /i386/app1/app1


Release Project (file compilation tracking)
-----------------------------------
Link: /headers/lib1.h -> <lib1-commit3>/src/lib1.h
Link: /bin/lib1.so -> <build1-commit>/i386/lib1/lib1.so
Link: /bin/app1 -> <build1-commit>/i386/app1/app1


<lib1-commit1>, <lib1-commit2> and <lib1-commit3> should be the same,
dictated by the app1 project. Can we enforce this in the modules file
or should the different supermodules fix this somehow using
scripts/hooks?

How do the super-projects in this case get access to the blobs pointed
by the links - transparent or explicit in the build-process?


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-14 21:27                                                                                             ` Torgil Svensson
@ 2006-12-14 23:07                                                                                               ` Josef Weidendorfer
  2006-12-15 17:43                                                                                                 ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-14 23:07 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: R. Steve McKown, Linus Torvalds, git

On Thursday 14 December 2006 22:27, Torgil Svensson wrote:
> This example is somewhat complex since the build for lib1.so and the
> header-file might not has gone through the same commit on the lib1
> subproject.  Consider this example:

If you want to track build results for some source,
why would you ever want these builds go out of sync with the source?
As the built files depend on the source (and other things), the
source should be a submodule of the build project.

Hmm... I think I see a problem / wish for submodules here.

With the current submodule proposal, we force submodules to be
subdirectories inside of a supermodule.

Your example has the folling submodule dependence
("X ==> Y" means Y being a submodule of X):

  App     ==>   Lib
   ^             ^
   |             |
 AppBuild ==> LibBuild

If we force submodules to be subdirectories of supermodules,
Lib needlessly will have to appear two times in a checkout of
AppBuild.

However, there is nothing wrong with it. Yet, you perhaps want
the 2 Lib submodules not to go out of sync. This easily
can be done with symlinking the Lib checkouts. As they are submodules,
everything should work fine.

Perhaps an option you want to have is to force a checkout
of AppBuild to make these symlinking itself when it detects
identical submodules links.

Hmmm... the only problem with a symlink is that it can go wrong
when moved. Unfortunately, I do not have a good solution for
this. We can not make UNIX symlinks smart in any way.
Hardlinking directories would be a solution, but that is not
possible.

Another thing:
With normal "$buildroot != $srcroot" environments, the source
can not be a subdirectory of the build directory.
Yet, we want to specify submodule/supermodule relation.
This is difficult to do with a submodule object, as it needs
to appear in trees in the supermodule.

Actually, the best workaround for this is to make Lib a direct
submodule of AppBuild, and specify the relationship of
LibBuild ==> Lib only in AppBuild.

BTW, build project commits probably should not depend on any
history of other build commits.
So you actually want all build commits to be root commits, and
have a tag name which could include the source commit id from
which the build was done. This gives some loose coupling.

> Link: /headers/lib1.h -> <lib1-commit3>/src/lib1.h
> Link: /bin/lib1.so -> <build1-commit>/i386/lib1/lib1.so
> Link: /bin/app1 -> <build1-commit>/i386/app1/app1
> 
> 
> <lib1-commit1>, <lib1-commit2> and <lib1-commit3> should be the same,
> dictated by the app1 project.

I do not see any problem here. Symlinks are stored in the git repository.
As the AppBuild commit depends on App and LibBuild submodule commits, the
symlinks always should be correct.

> Can we enforce this in the modules file 
> or should the different supermodules fix this somehow using
> scripts/hooks?

I do not see any need for an hook. But of course, a checkout hook should
be able to generate files/links. However, IMHO this should be not
done with hooks but with Makefile targets.

> How do the super-projects in this case get access to the blobs pointed
> by the links - transparent or explicit in the build-process?

Submodules should automatically be checked out when checking out the
supermodule. So the blobs should already be there.
Or do I miss something?

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-14 23:07                                                                                               ` Josef Weidendorfer
@ 2006-12-15 17:43                                                                                                 ` Torgil Svensson
  2006-12-15 21:42                                                                                                   ` Josef Weidendorfer
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-15 17:43 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: R. Steve McKown, Linus Torvalds, git

On 12/15/06, Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> If you want to track build results for some source,
> why would you ever want these builds go out of sync with the source?

I don't, bad wording by me. That was the problem I wanted to address.

> Your example has the folling submodule dependence
> ("X ==> Y" means Y being a submodule of X):
>
>   App     ==>   Lib
>    ^             ^
>    |             |
>  AppBuild ==> LibBuild

In my example "AppBuild" and "LibBuild" were the same project but this
scenario is relevant as well.

> If we force submodules to be subdirectories of supermodules,
> Lib needlessly will have to appear two times in a checkout of
> AppBuild.
>
> However, there is nothing wrong with it. Yet, you perhaps want
> the 2 Lib submodules not to go out of sync. This easily
> can be done with symlinking the Lib checkouts. As they are submodules,
> everything should work fine.

This is interesting. In my notation:

/path/to/link/name -> <commit>/path/to/subtree

means that there is a link named "name" in the tree object for
"path/to/link". The link points to a "link object" specifying a
subtree or blob of the tree that is pointed to in a submodule commit.
This is not currently implemented but has at least the following
advantages:

1. You can access files in a submodule without fetching the whole
submodule (which may be very large). (App1 is only interested in
lib1.h, the rest is toally irrelevant)
2. Superproject can access referenced (linked) files in it's own
folder-structure without being forced a structure by the subproject.

If you do a symlink instead, doesn't you loose versioning information?
What happens with the symlinks if someone clones the superproject?

>
> Perhaps an option you want to have is to force a checkout
> of AppBuild to make these symlinking itself when it detects
> identical submodules links.
>
> Hmmm... the only problem with a symlink is that it can go wrong
> when moved. Unfortunately, I do not have a good solution for
> this. We can not make UNIX symlinks smart in any way.
> Hardlinking directories would be a solution, but that is not
> possible.
>

Wouldn't specifying the submodule path in the link object fit in well
here? Then each "link object" can represent a checked out tree from
the subproject in the superproject directory-structure.

> Another thing:
> With normal "$buildroot != $srcroot" environments, the source
> can not be a subdirectory of the build directory.

This is true for symlinks and would also be corrected if we have a
(sparse) submodule checkout there in it's place.

> BTW, build project commits probably should not depend on any
> history of other build commits.

Why? Can you give an example here.

> > Link: /headers/lib1.h -> <lib1-commit3>/src/lib1.h
> > Link: /bin/lib1.so -> <build1-commit>/i386/lib1/lib1.so
> > Link: /bin/app1 -> <build1-commit>/i386/app1/app1
> >
> >
> > <lib1-commit1>, <lib1-commit2> and <lib1-commit3> should be the same,
> > dictated by the app1 project.
>
> I do not see any problem here. Symlinks are stored in the git repository.
> As the AppBuild commit depends on App and LibBuild submodule commits, the
> symlinks always should be correct.

The main reason for these "links" are for versioning purposes: the
uniqe SHA1 of the "link" representing a tree/blob in a version of the
submodule should be "included" in the supermodules commit. Symlinks
won't give that at all.

> > How do the super-projects in this case get access to the blobs pointed
> > by the links - transparent or explicit in the build-process?
>
> Submodules should automatically be checked out when checking out the
> supermodule. So the blobs should already be there.
> Or do I miss something?

Probably not as that was a piece of the puzzle that I was missing.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-15 17:43                                                                                                 ` Torgil Svensson
@ 2006-12-15 21:42                                                                                                   ` Josef Weidendorfer
  2006-12-15 23:43                                                                                                     ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Josef Weidendorfer @ 2006-12-15 21:42 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: R. Steve McKown, Linus Torvalds, git

On Friday 15 December 2006 18:43, Torgil Svensson wrote:
> On 12/15/06, Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> > However, there is nothing wrong with it. Yet, you perhaps want
> > the 2 Lib submodules not to go out of sync. This easily
> > can be done with symlinking the Lib checkouts. As they are submodules,
> > everything should work fine.
> 
> This is interesting. In my notation:
> 
> /path/to/link/name -> <commit>/path/to/subtree
> 
> means that there is a link named "name" in the tree object for
> "path/to/link". The link points to a "link object" specifying a
> subtree or blob of the tree that is pointed to in a submodule commit.

Ah, now I understand. I somehow missed this notation.

> This is not currently implemented but has at least the following
> advantages:
> 
> 1. You can access files in a submodule without fetching the whole
> submodule (which may be very large). (App1 is only interested in
> lib1.h, the rest is toally irrelevant)
> 2. Superproject can access referenced (linked) files in it's own
> folder-structure without being forced a structure by the subproject.

That all sounds fine, but how do you create such symlinks in practice?
Do you want to introduce special porcelain commands to create them?
Especially, what is the SCM user supposed to do to change the link
target, ie. from
 <commit>/path/to/subtree
to 
 <commit>/path2/to2/subtree2
?
Should this do a re-checkout at the other point?

By linking a file from a submodule, such a link seems to force that
this file has to be at a fixed position in the submodule. Otherwise,
some magic has to happen when the file is moved in the submodule,
possibly leading to a dangling link, eg. if the whole subdirectory
specified in the link is removed.

IMHO this is getting way to complex.
Much simpler is to include the full submodule at some path in
the supermodule, and create normal symlinks from the supermodule
into the submodule.

If you only want to check out part of a submodule, this should be
done with path-limiting checkouts, which should be a feature totally
independent from submodules.

And if you want to limit the number of objects transferred in cloning
of a subproject, it is better to further split this subproject into
multiple subprojects itself.

> If you do a symlink instead, doesn't you loose versioning information?

Of course, you need the submodule fully checked out somewhere in the
supermodule, and the link goes into the submodule directory. The
versioning is given by the supermodule/submodule link.

> What happens with the symlinks if someone clones the superproject?

As already said: the link has to go into a submodule directory, which
will be checked out automatically with the clone of the supermodule.

> > Perhaps an option you want to have is to force a checkout
> > of AppBuild to make these symlinking itself when it detects
> > identical submodules links.
> >
> > Hmmm... the only problem with a symlink is that it can go wrong
> > when moved. Unfortunately, I do not have a good solution for
> > this. We can not make UNIX symlinks smart in any way.
> > Hardlinking directories would be a solution, but that is not
> > possible.
> >
> 
> Wouldn't specifying the submodule path in the link object fit in well
> here? Then each "link object" can represent a checked out tree from
> the subproject in the superproject directory-structure.

The problem is not the representation in the git repository, but the
checked out module/submodule, where you need to use normal UNIX file semantics.
To move submodules around, the user should be able to just use
the normal UNIX "mv" commands, and git should be able to detect move
actions after the fact.
The simple thing here is that currently, git does not have this problem
as it tracks content, and does not even try to detect any moves at
commit time. This is different with submodules, as there, you want to
be able to track moves of any submodule root directories.

This now becomes a problem if you use symlinks to "unify" multiple checkouts
of the same submodule at multiple places in the supermodule, and move
the symlink around, as it easily can get dangling this way. Thus, you would
not have a way to see what submodule this link was talking about.

And for this thing, I do not see how your link object could help.

So it is better to use a simple submodule concept, and for this corner
cases, we perhaps could expect the user to fix e.g. a dangling symlink
to a previous submodule checkout himself, using a meaningful error message.

> > BTW, build project commits probably should not depend on any
> > history of other build commits.
> 
> Why? Can you give an example here.

If you have a source commit chain A => B => C => D, you want
to make any build commits totally independent: you first only
are interested in a build commit for source versions A and D,
and later find out that a build commit for B and C would be nice,
too. If you force build commits into some history order, this
order now would be A => D => B => C, which makes no sense.

Build commit independence can easily be achieved by making every commit
parentless, without further history. You still have the link
to the source version via the submodule link in the tree.
But to not loose any such build commits, they have to appear
as tags or refs (unless integrated in another superproject
build commit).

> > > Link: /headers/lib1.h -> <lib1-commit3>/src/lib1.h
> > > Link: /bin/lib1.so -> <build1-commit>/i386/lib1/lib1.so
> > > Link: /bin/app1 -> <build1-commit>/i386/app1/app1
> > >
> > >
> > > <lib1-commit1>, <lib1-commit2> and <lib1-commit3> should be the same,
> > > dictated by the app1 project.
> >
> > I do not see any problem here. Symlinks are stored in the git repository.
> > As the AppBuild commit depends on App and LibBuild submodule commits, the
> > symlinks always should be correct.
> 
> The main reason for these "links" are for versioning purposes: the
> uniqe SHA1 of the "link" representing a tree/blob in a version of the
> submodule should be "included" in the supermodules commit. Symlinks
> won't give that at all.

The version coupling will be there if the whole submodule is available
at some path in the supermodule checkout, as said above.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-15 21:42                                                                                                   ` Josef Weidendorfer
@ 2006-12-15 23:43                                                                                                     ` Torgil Svensson
  2006-12-16  1:13                                                                                                       ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-15 23:43 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: R. Steve McKown, Linus Torvalds, git

On 12/15/06, Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> That all sounds fine, but how do you create such symlinks in practice?

I'm very open to suggestions here, but the concept growing in my head
is based around Linus 'module'-file and keep things simple. A git
configuration file that specifies:
* link name for reference
* local path to link
* submodule source
* submodule path to tree/blob
* submodule commit / HEAD / branch
* options (depth-limit , ...)

I'm reconsidering having the path-name in the link, it should be
sufficient to have two SHA1's, one for the commit and one for the
tree/blob. Super-module should have the tree/blob in it's database so
that the link part only is there for version information and reference
(checking dirty state or history on the submodule). This way it easy
to clone the super-project and use it without having to map up all
sub-project sources. Sub-project sources is not important for version
information and could always be specified in the project in a
README-type of file.

> Especially, what is the SCM user supposed to do to change the link
> target, ie. from
>  <commit>/path/to/subtree
> to
>  <commit>/path2/to2/subtree2
> ?
> Should this do a re-checkout at the other point?

That would be a change in the modules file, maybe through a command
that also fixes the link. The link will have to be updated in the
index and commited as normal.

> By linking a file from a submodule, such a link seems to force that
> this file has to be at a fixed position in the submodule. Otherwise,
> some magic has to happen when the file is moved in the submodule,
> possibly leading to a dangling link, eg. if the whole subdirectory
> specified in the link is removed.

Since we have the SHA1 (this is what we're using) and tree/blob
information in the super-modules database the change itself is not a
problem. The problem is to track renames/moves and your remove case in
the submodule. The tool that tracks the submodule should probably
warn/exit here and we would fix up the modules file manually.

> IMHO this is getting way to complex.

One of complex situation here as I see it is the ability to handle to
track/checkout only a subset (tree/blob) of the submodule. This is
also quite an important feature - in my example it means the
difference of tracking one header file versus the whole source.

> If you only want to check out part of a submodule, this should be
> done with path-limiting checkouts, which should be a feature totally
> independent from submodules.

If we can do path-limiting checkouts on a repo (module) we also can do
it on a sub-module since they are exactly the same. This is a very
powerful feature and it'd be a huge waste if it wasn't allowed for a
super-module to do on submodules.

> And if you want to limit the number of objects transferred in cloning
> of a subproject, it is better to further split this subproject into
> multiple subprojects itself.

What if we have no control of the submodule?  This can be tracked from
upstream, sourceforge, another company, etc. The submodule will often
live their own life and could be X, kernel, gcc, cairo, whatever, ...

> The problem is not the representation in the git repository, but the
> checked out module/submodule, where you need to use normal UNIX file semantics.
> To move submodules around, the user should be able to just use
> the normal UNIX "mv" commands, and git should be able to detect move
> actions after the fact.

If we disregard the commit info, the link will act exactly as a normal
tree/blob. Git can know we're moving a subproject by watching the
module file. The main problem is to keep modules file up-to-date with
reality. We could enforce module file validity by disallowing such
operations and let the user do a "force" operation which also alters
the modules file.

> This now becomes a problem if you use symlinks to "unify" multiple checkouts
> of the same submodule at multiple places in the supermodule, and move
> the symlink around, as it easily can get dangling this way. Thus, you would
> not have a way to see what submodule this link was talking about.

The symlink only exists in the modules file. We only have the SHA1's
at the tree-level and there we have everything underneath the
tree/blob SHA1 in our database. We will only know if the modules
symlink file is dangling next time we fetch from the submodule - here
we would notify the user but our database is still consistent.

> If you have a source commit chain A => B => C => D, you want
> to make any build commits totally independent: you first only
> are interested in a build commit for source versions A and D,
> and later find out that a build commit for B and C would be nice,
> too. If you force build commits into some history order, this
> order now would be A => D => B => C, which makes no sense.

It makes no sense because the user seem to have act irrationally. The
commit-chain is completely valid as it has tracked the correct history
of the builds. I can't see any problems here, the build-project is
independent of the source-project with it's own history. We can hope
the user has given good explanations for his/her actions in the commit
messages though.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-15 23:43                                                                                                     ` Torgil Svensson
@ 2006-12-16  1:13                                                                                                       ` Torgil Svensson
  2006-12-16  1:20                                                                                                         ` Torgil Svensson
  2006-12-16  1:49                                                                                                         ` Linus Torvalds
  0 siblings, 2 replies; 252+ messages in thread
From: Torgil Svensson @ 2006-12-16  1:13 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: R. Steve McKown, Linus Torvalds, git

On 12/16/06, Torgil Svensson <torgil.svensson@gmail.com> wrote:

> I'm very open to suggestions here, but the concept growing in my head
> is based around Linus 'module'-file and keep things simple. A git
> configuration file that specifies:
> * link name for reference
> * local path to link
> * submodule source
> * submodule path to tree/blob
> * submodule commit / HEAD / branch
> * options (depth-limit , ...)
>
> I'm reconsidering having the path-name in the link, it should be
> sufficient to have two SHA1's, one for the commit and one for the
> tree/blob. Super-module should have the tree/blob in it's database so
> that the link part only is there for version information and reference
> (checking dirty state or history on the submodule). This way it easy
> to clone the super-project and use it without having to map up all
> sub-project sources. Sub-project sources is not important for version
> information and could always be specified in the project in a
> README-type of file.
>

See it as the link only is there for the version handling between
different modules and it's the module file that give an UI to the the
link (which project, branch, ....). Many users will not care whats
behind those links, but if they want to edit the link they have to
create the modules file or fetch it somewhere - it may even be
provided and version controlled in the project itself.

example tree object:

100644 blob <sha1 of blob>    README
100644 blob <sha1 of blob>    REPORTING-BUGS
100644 link <sha1 of blob>     <sha1 of commit>
040000 tree <sha1 of tree>    arch
040000 tree <sha1 of tree>    block
040000 link <sha1 of tree>     <sha1 of commit>

Note that the links functions exactly as the blobs and trees in the
database. The difference is that they origin from _a_subproject_ (we
don't care which in this stage) with the specified commit SHA1. If the
link isn't represented in the modules file, it's no big deal, it can
be added later on if needed.

If the blame-game begins or if we want to check what we're using on a
submodule level we can always pinpoint the exact file/tree content and

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  1:13                                                                                                       ` Torgil Svensson
@ 2006-12-16  1:20                                                                                                         ` Torgil Svensson
  2006-12-16  1:34                                                                                                           ` Jakub Narebski
  2006-12-16  1:49                                                                                                         ` Linus Torvalds
  1 sibling, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-16  1:20 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: R. Steve McKown, Linus Torvalds, git

On 12/16/06, Torgil Svensson <torgil.svensson@gmail.com> wrote:
>
> example tree object:
>
> 100644 blob <sha1 of blob>    README
> 100644 blob <sha1 of blob>    REPORTING-BUGS
> 100644 link <sha1 of blob>     <sha1 of commit>
> 040000 tree <sha1 of tree>    arch
> 040000 tree <sha1 of tree>    block
> 040000 link <sha1 of tree>     <sha1 of commit>
>

Sorry, I was sloppy and forgot the names:

100644 blob <sha1 of blob>    README
100644 blob <sha1 of blob>    REPORTING-BUGS
100644 link <sha1 of blob>     <sha1 of commit>   AUTHORS
040000 tree <sha1 of tree>    arch
040000 tree <sha1 of tree>    block
040000 link <sha1 of tree>     <sha1 of commit>   misc

Now it doesn't looks like trees/blobs anymore so maybe a link object is handy:

100644 blob <sha1 of blob>    README
100644 blob <sha1 of blob>    REPORTING-BUGS
100644 link <sha1 of link>      AUTHORS
040000 tree <sha1 of tree>    arch
040000 tree <sha1 of tree>    block
040000 link <sha1 of link>     misc

link-object:
<sha1 of commit>

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  1:20                                                                                                         ` Torgil Svensson
@ 2006-12-16  1:34                                                                                                           ` Jakub Narebski
  2006-12-16  8:40                                                                                                             ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-12-16  1:34 UTC (permalink / raw)
  To: git

Torgil Svensson wrote:

> On 12/16/06, Torgil Svensson <torgil.svensson@gmail.com> wrote:
>>
>> example tree object:
>>
>> 100644 blob <sha1 of blob>    README
>> 100644 blob <sha1 of blob>    REPORTING-BUGS
>> 100644 link <sha1 of blob>     <sha1 of commit>
>> 040000 tree <sha1 of tree>    arch
>> 040000 tree <sha1 of tree>    block
>> 040000 link <sha1 of tree>     <sha1 of commit>
>>
> 
> Sorry, I was sloppy and forgot the names:
> 
> 100644 blob <sha1 of blob>    README
> 100644 blob <sha1 of blob>    REPORTING-BUGS
> 100644 link <sha1 of blob>     <sha1 of commit>   AUTHORS
> 040000 tree <sha1 of tree>    arch
> 040000 tree <sha1 of tree>    block
> 040000 link <sha1 of tree>     <sha1 of commit>   misc
> 
> Now it doesn't looks like trees/blobs anymore so maybe a link object
> is handy: 
> 
> 100644 blob <sha1 of blob>    README
> 100644 blob <sha1 of blob>    REPORTING-BUGS
> 100644 link <sha1 of link>      AUTHORS
> 040000 tree <sha1 of tree>    arch
> 040000 tree <sha1 of tree>    block
> 040000 link <sha1 of link>     misc
> 
> link-object:
> <sha1 of commit>
> <sha1 of tree/blob>

What do you need <sha1 of tree/blob> for in link-object? Wouldn't you
use usually the sha1 of top tree of a commit, which is uniquely defined
by commit object, so you need only <ahs1 of commit>?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  1:13                                                                                                       ` Torgil Svensson
  2006-12-16  1:20                                                                                                         ` Torgil Svensson
@ 2006-12-16  1:49                                                                                                         ` Linus Torvalds
  2006-12-16  2:12                                                                                                           ` Linus Torvalds
  1 sibling, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-16  1:49 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: Josef Weidendorfer, R. Steve McKown, git



On Sat, 16 Dec 2006, Torgil Svensson wrote:
> 
> 100644 blob <sha1 of blob>    README
> 100644 blob <sha1 of blob>    REPORTING-BUGS
> 100644 link <sha1 of blob>     <sha1 of commit>
> 040000 tree <sha1 of tree>    arch
> 040000 tree <sha1 of tree>    block
> 040000 link <sha1 of tree>     <sha1 of commit>

That 040000 needs to be something else.

In order for something like a git-fsck-objects to know that it's a link, 
it needs to be marked as such. 

In git, we never just randomly open an object by SHA1, and then figure out 
its type. We always open things by explicitly knowing both the type and 
the SHA1, and if the object we find has the wrong type, that's a 
consistency error in the database (or the user).


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  1:49                                                                                                         ` Linus Torvalds
@ 2006-12-16  2:12                                                                                                           ` Linus Torvalds
  2006-12-16  8:50                                                                                                             ` Torgil Svensson
  0 siblings, 1 reply; 252+ messages in thread
From: Linus Torvalds @ 2006-12-16  2:12 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: Josef Weidendorfer, R. Steve McKown, git

On Fri, 15 Dec 2006, Linus Torvalds wrote:

> 
> On Sat, 16 Dec 2006, Torgil Svensson wrote:
> > 
> > 100644 blob <sha1 of blob>    README
> > 100644 blob <sha1 of blob>    REPORTING-BUGS
> > 100644 link <sha1 of blob>     <sha1 of commit>
> > 040000 tree <sha1 of tree>    arch
> > 040000 tree <sha1 of tree>    block
> > 040000 link <sha1 of tree>     <sha1 of commit>
> 
> That 040000 needs to be something else.

Side note: that's not to say that I would really see why you'd want to 
have both the tree and the commit SHA1's, and why you seemingly think that 
the links don't need a filename. Hmm?

If you require the tree objects to be in the database, you might as well 
require that the commit object be there. But you could make rules that say 
that subprojects don't need the whole commit history, for example (which 
is just a shallow clone in the subproject).

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  1:34                                                                                                           ` Jakub Narebski
@ 2006-12-16  8:40                                                                                                             ` Torgil Svensson
  2006-12-16  9:57                                                                                                               ` Jakub Narebski
  0 siblings, 1 reply; 252+ messages in thread
From: Torgil Svensson @ 2006-12-16  8:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Josef Weidendorfer, R. Steve McKown, git, Linus Torvalds

On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:
> > 100644 blob <sha1 of blob>  >
> >>
> >
> > Sorry, I was sloppy and forgot the names:
> >
> > 100644 blob <sha1 of blob>    README
> > 100644 blob <sha1 of blob>    REPORTING-BUGS
> > 100644 link <sha1 of blob>     <sha1 of commit>   AUTHORS
> > 040000 tree <sha1 of tree>    arch
> > 040000 tree <sha1 of tree>    block
> > 040000 link <sha1 of tree>     <sha1 of commit>   misc
> >
> > Now it doesn't looks like trees/blobs anymore so maybe a link object
> > is handy:
> >  README
> > 100644 blob <sha1 of blob>    REPORTING-BUGS
> > 100644 link <sha1 of link>      AUTHORS
> > 040000 tree <sha1 of tree>    arch
> > 040000 tree <sha1 of tree>    block
> > 040000 link <sha1 of link>     misc
> >
> > link-object:
> > <sha1 of commit>
> > <sha1 of tree/blob>
>
> What do you need <sha1 of tree/blob> for in link-object? Wouldn't you
> use usually the sha1 of top tree of a commit, which is uniquely defined
> by commit object, so you need only <ahs1 of commit>?
>

1. "Sparse" repository's - In my example, I want to cherry-pick
header-files or binary-files from different projects without fetching
all, potentially huge, submodules in their entirety. Imaging having X,
kernel, gcc, gtk and libc6 as sub-projects and you really only care
about some header files.

2. Super-module directory-hierarchy independent from submodules.
Super-project want to have the header-files and binaries it's own way.
This also gives version controlled file-collections, the "release
case" in my example - collecting different binaries and header-files
from different submodules together in a new directory-structure, add
some documentation and configuration files and get the whole thing
under strong version-control down to the beginning of time for each
little component.

3. Super-module development independent of submodules - If we have the
tree/blob-object with all it contents in the database many
git-operations can act as the link (commit) wasn't there since we have
access to all relevant data to work with. This makes it easy to clone
the super-project and work on it seamlessly without having to care
about submodules or mapping up submodule repository's (unless you want
to modify the links or the data underneath it of course).

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  2:12                                                                                                           ` Linus Torvalds
@ 2006-12-16  8:50                                                                                                             ` Torgil Svensson
  0 siblings, 0 replies; 252+ messages in thread
From: Torgil Svensson @ 2006-12-16  8:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josef Weidendorfer, R. Steve McKown, git

On 12/16/06, Linus Torvalds <torvalds@osdl.org> wrote:

> Side note: that's not to say that I would really see why you'd want to
> have both the tree and the commit SHA1's, and why you seemingly think that
> the links don't need a filename. Hmm?

I really want that file-name back - we can call it a mind short-circuit.

> If you require the tree objects to be in the database, you might as well
> require that the commit object be there. But you could make rules that say
> that subprojects don't need the whole commit history, for example (which
> is just a shallow clone in the subproject).

You have a very good point here, this would give us the history of the

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  8:40                                                                                                             ` Torgil Svensson
@ 2006-12-16  9:57                                                                                                               ` Jakub Narebski
  2006-12-16 10:25                                                                                                                 ` Junio C Hamano
  2006-12-16 15:05                                                                                                                 ` Torgil Svensson
  0 siblings, 2 replies; 252+ messages in thread
From: Jakub Narebski @ 2006-12-16  9:57 UTC (permalink / raw)
  To: git

<opublikowany i wysłany>

Torgil Svensson wrote:

> On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:

>>> Now it doesn't looks like trees/blobs anymore so maybe a link object
>>> is handy:
>>>  README
>>> 100644 blob <sha1 of blob>    REPORTING-BUGS
>>> 100644 link <sha1 of link>      AUTHORS
>>> 040000 tree <sha1 of tree>    arch
>>> 040000 tree <sha1 of tree>    block
>>> 040000 link <sha1 of link>     misc

This would be (using the submodule original proposal)

    140000 link <sha1 of link>     misc

>>> link-object:
>>> <sha1 of commit>
>>> <sha1 of tree/blob>
>>
>> What do you need <sha1 of tree/blob> for in link-object? Wouldn't you
>> use usually the sha1 of top tree of a commit, which is uniquely defined
>> by commit object, so you need only <sha1 of commit>?
>>
> 
> 1. "Sparse" repository's - In my example, I want to cherry-pick
> header-files or binary-files from different projects without fetching
> all, potentially huge, submodules in their entirety. Imaging having X,
> kernel, gcc, gtk and libc6 as sub-projects and you really only care
> about some header files.
> 
> 2. Super-module directory-hierarchy independent from submodules.
> Super-project want to have the header-files and binaries it's own way.
> This also gives version controlled file-collections, the "release
> case" in my example - collecting different binaries and header-files
> from different submodules together in a new directory-structure, add
> some documentation and configuration files and get the whole thing
> under strong version-control down to the beginning of time for each
> little component.

All fine, but this does not and I think cannot protect us from the
fact that we can have <sha1 of tree/blob> which doesn't match
<sha1 of commit>.

I think it would be better to have sparse/partial checkout first.
But that is just my idea. Because with <sha1 of tree/blob> which
is not sha1 of commit tree you might loose (I think) the ability
to merge, for example your changes to submodule with upstream.

> 3. Super-module development independent of submodules - If we have the
> tree/blob-object with all it contents in the database many
> git-operations can act as the link (commit) wasn't there since we have
> access to all relevant data to work with. This makes it easy to clone
> the super-project and work on it seamlessly without having to care
> about submodules or mapping up submodule repository's (unless you want
> to modify the links or the data underneath it of course).

This is I think irrelevant to the fact if we have only <sha1 of commit>,
or link object and also <sha1 of tree/blob>
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  9:57                                                                                                               ` Jakub Narebski
@ 2006-12-16 10:25                                                                                                                 ` Junio C Hamano
  2006-12-16 15:05                                                                                                                 ` Torgil Svensson
  1 sibling, 0 replies; 252+ messages in thread
From: Junio C Hamano @ 2006-12-16 10:25 UTC (permalink / raw)
  To: jnareb
  Cc: Torgil Svensson, Josef Weidendorfer, R. Steve McKown, git,
	Linus Torvalds

[jc: adding back people on CC list while feeling sick of having
to do so...]

Jakub Narebski <jnareb@gmail.com> writes:

>>>> Now it doesn't looks like trees/blobs anymore so maybe a link object
>>>> is handy:
>>>>  README
>>>> 100644 blob <sha1 of blob>    REPORTING-BUGS
>>>> 100644 link <sha1 of link>      AUTHORS
>>>> 040000 tree <sha1 of tree>    arch
>>>> 040000 tree <sha1 of tree>    block
>>>> 040000 link <sha1 of link>     misc
>
> This would be (using the submodule original proposal)
>
>     140000 link <sha1 of link>     misc

If I recall correctly, the original original proposal used
160000 because that is not a bitpattern used in stat.h, and a
link behaves like a directory and a symbolic link at the same
time.

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16  9:57                                                                                                               ` Jakub Narebski
  2006-12-16 10:25                                                                                                                 ` Junio C Hamano
@ 2006-12-16 15:05                                                                                                                 ` Torgil Svensson
  2006-12-16 15:38                                                                                                                   ` Torgil Svensson
  2006-12-16 16:32                                                                                                                   ` Jakub Narebski
  1 sibling, 2 replies; 252+ messages in thread
From: Torgil Svensson @ 2006-12-16 15:05 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Josef Weidendorfer, R. Steve McKown, git, Linus Torvalds

On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:
> All fine, but this does not and I think cannot protect us from the
> fact that we can have <sha1 of tree/blob> which doesn't match
> <sha1 of commit>.

True, that will be a real problem. Unless we have a bug in git, do you
see a scenario in which this is likely to happen?


> I think it would be better to have sparse/partial checkout first.
> But that is just my idea. Because with <sha1 of tree/blob> which
> is not sha1 of commit tree you might loose (I think) the ability
> to merge, for example your changes to submodule with upstream.

That's correct. I also want a sparse/partial checkout but I don't want
the full submodule path. I'm also perfectly fine (for my current
use-cases) with not being able to merge upstream unless we're tracking
the commit tree (here, we might not want to specify the tree SHA1).

I'm not trying to impose a technically fragile solution here [I don't
believe it is, but I'm not the most competent to say that either], I'm
trying to find solutions for my use cases and I had problems adapting
them to the current suggestion.


> > 3. Super-module development independent of submodules - If we have the
> > tree/blob-object with all it contents in the database many
> > git-operations can act as the link (commit) wasn't there since we have
> > access to all relevant data to work with. This makes it easy to clone
> > the super-project and work on it seamlessly without having to care
> > about submodules or mapping up submodule repository's (unless you want
> > to modify the links or the data underneath it of course).
>
> This is I think irrelevant to the fact if we have only <sha1 of commit>,
> or link object and also <sha1 of tree/blob>


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16 15:05                                                                                                                 ` Torgil Svensson
@ 2006-12-16 15:38                                                                                                                   ` Torgil Svensson
  2006-12-16 16:32                                                                                                                   ` Jakub Narebski
  1 sibling, 0 replies; 252+ messages in thread
From: Torgil Svensson @ 2006-12-16 15:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Josef Weidendorfer, R. Steve McKown, git, Linus Torvalds

On 12/16/06, Torgil Svensson <torgil.svensson@gmail.com> wrote:
> On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:
> > All fine, but this does not and I think cannot protect us from the
> > fact that we can have <sha1 of tree/blob> which doesn't match
> > <sha1 of commit>.
>
> True, that will be a real problem. Unless we have a bug in git, do you
> see a scenario in which this is likely to happen?
>
> I also want a sparse/partial checkout but I don't want
> the full submodule path.

This might not be as problematic as we think. If we do the same
sparse/partial checkout (what's the definition here?) with the <sha1
of tree/blob> as we do with the only <sha1 of commit> case and
consider the <sha1 of tree/blob> to be a _local_ (to the
super-project) shortcut. Then we only track the submodules using the
commit - local conflicts are easier to handle, git would refuse to
commit a <sha1 of tree/blob> not present in the commit tree.

We might even consider two object types:
module: <sha1 of commit> name
link: <sha1 of commit> <sha1 of tree/blob> name


^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16 15:05                                                                                                                 ` Torgil Svensson
  2006-12-16 15:38                                                                                                                   ` Torgil Svensson
@ 2006-12-16 16:32                                                                                                                   ` Jakub Narebski
  2006-12-17  0:21                                                                                                                     ` Torgil Svensson
  1 sibling, 1 reply; 252+ messages in thread
From: Jakub Narebski @ 2006-12-16 16:32 UTC (permalink / raw)
  To: Torgil Svensson, git; +Cc: Josef Weidendorfer, R. Steve McKown, Linus Torvalds

Torgil Svensson wrote:
> On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:
>> All fine, but this does not and I think cannot protect us from the
>> fact that we can have <sha1 of tree/blob> which doesn't match
>> <sha1 of commit>.
> 
> True, that will be a real problem. Unless we have a bug in git, do you
> see a scenario in which this is likely to happen?

Well, I just rather have than <sha1 of tree/blob> the definition
of sparse checkout (for example subdirectory name, or file name,
or glob pattern).

Besides you need the name of directory (for tree) or file (for blob),
otherwise you would have no way to update it when submodule advances
version, and you want to use new submodule version. And if you have
that, you don't need <sha1 of tree/blob> in repository, in link object.
You might want it in the index, for performance reasons, though.
 
>> I think it would be better to have sparse/partial checkout first.
>> But that is just my idea. Because with <sha1 of tree/blob> which
>> is not sha1 of commit tree you might loose (I think) the ability
>> to merge, for example your changes to submodule with upstream.
> 
> That's correct. I also want a sparse/partial checkout but I don't want
> the full submodule path. I'm also perfectly fine (for my current
> use-cases) with not being able to merge upstream unless we're tracking
> the commit tree (here, we might not want to specify the tree SHA1).

With sparse (for example defined by 'src/*.h') or partial (for example
defined by 'Documentation/') checkout you should be able to merge
upstream... unless conflicts are in the not checked out part.

> I'm not trying to impose a technically fragile solution here [I don't
> believe it is, but I'm not the most competent to say that either], I'm
> trying to find solutions for my use cases and I had problems adapting
> them to the current suggestion.

Have you read  http://git.or.cz/gitwiki/SubprojectSupport on GitWiki?
Have you tested the experimental submodule support (proof of concept)
  http://git.admingilde.org/tali/git.git/module2
by Martin Waitz?

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 252+ messages in thread

* Re: [RFC] Submodules in GIT
  2006-12-16 16:32                                                                                                                   ` Jakub Narebski
@ 2006-12-17  0:21                                                                                                                     ` Torgil Svensson
  0 siblings, 0 replies; 252+ messages in thread
From: Torgil Svensson @ 2006-12-17  0:21 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Josef Weidendorfer, R. Steve McKown, Linus Torvalds

On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:
> Well, I just rather have than <sha1 of tree/blob> the definition
> of sparse checkout (for example subdirectory name, or file name,
> or glob pattern).

This is entirely an UI issue:

On 12/16/06, Torgil Svensson <torgil.svensson@gmail.com> wrote:
> is based around Linus 'module'-file and keep things simple. A git
> configuration file that specifies:
> * link name for reference
> * local path to link
> * submodule source
> * submodule path to tree/blob
> * submodule commit / HEAD / branch
> * options (depth-limit , ...)

On 12/16/06, Jakub Narebski <jnareb@gmail.com> wrote:
> And if you have that, you don't need <sha1 of tree/blob> in repository, in link object.

Correct. Since the commit contains all the version information, the
following combinations should give the same information iff we keep
the commit in the database:
1. <sha1 of commit> + <sha1 of tree/blob>
2. <sha1 of commit> + <symlink to tree/blob>

I used the sha1 because I wanted them to behave exactly like
trees/blobs in the database for operations that can disregard the
commit info. Now, if we keep the commit in the database as Linus
suggests we can reach the target from there with a symlink. This would
be more readable but also cost a few object lookups extra iterating
over the symlink.

> With sparse (for example defined by 'src/*.h') or partial (for example
> defined by 'Documentation/') checkout you should be able to merge
> upstream... unless conflicts are in the not checked out part.

This would be a great feature! Will this conflict with path shortcuts?
If so, we might consider two types of objects: "link" which cannot
merge upstream and "module" which can merge upstream and contains a
.git object repository.

IMHO, "module" is a more intuitive name for specifying a
(functionality wise fully fledged) submodule with a repository inside.
"link" could be used for just mirroring a tree/blob. I'm not sure if a
separation is needed on a technical level.

> Have you read  http://git.or.cz/gitwiki/SubprojectSupport on GitWiki?
Yes
> Have you tested the experimental submodule support (proof of concept)
Not yet

^ permalink raw reply	[flat|nested] 252+ messages in thread

end of thread, other threads:[~2006-12-17  0:21 UTC | newest]

Thread overview: 252+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
2006-11-20 22:16 ` Jakub Narebski
2006-11-20 22:28   ` Martin Waitz
2006-11-20 22:43   ` Junio C Hamano
2006-11-20 23:02     ` Jakub Narebski
2006-11-20 23:52       ` Martin Waitz
2006-11-21  1:31       ` Sam Vilain
2006-11-20 23:05     ` Linus Torvalds
2006-11-20 23:25       ` J. Bruce Fields
2006-11-20 23:33         ` Martin Waitz
2006-11-21 18:01           ` J. Bruce Fields
2006-11-21 19:32             ` Martin Waitz
2006-11-20 23:29       ` Martin Waitz
2006-11-21  0:10       ` Junio C Hamano
2006-11-21  0:42         ` Jakub Narebski
2006-11-21  6:21           ` Martin Waitz
2006-11-21 10:04             ` Jakub Narebski
2006-11-21 11:49               ` Martin Waitz
2006-11-21  6:27         ` Martin Waitz
2006-11-21  7:36           ` Junio C Hamano
2006-11-21  7:55             ` Martin Waitz
2006-11-21 22:31       ` Yann Dirson
2006-11-21 22:51         ` Linus Torvalds
2006-11-21 22:59           ` Linus Torvalds
2006-11-21 23:54           ` Yann Dirson
2006-11-22  3:40             ` Shawn Pearce
2006-11-23 23:23               ` Yann Dirson
2006-11-25  6:53                 ` Shawn Pearce
2006-11-25 11:12                   ` Yann Dirson
2006-11-25 18:57                     ` Linus Torvalds
2006-11-25 19:19                       ` Steven Grimm
2006-11-25 19:30                         ` Linus Torvalds
2006-11-25 23:49                           ` Yann Dirson
2006-11-26  1:14                             ` Sven Verdoolaege
2006-11-26  1:32                               ` Yann Dirson
2006-11-26  3:39                             ` Linus Torvalds
2006-11-26  8:05                               ` Daniel Barkalow
2006-11-28  9:36                                 ` Andreas Ericsson
2006-11-28 10:29                                   ` Andy Parkins
2006-11-28 10:50                                     ` Jakub Narebski
2006-11-28 13:35                                       ` Andy Parkins
2006-11-28 15:44                                         ` Shawn Pearce
2006-11-28 16:29                                           ` Andy Parkins
2006-11-28 16:36                                             ` Shawn Pearce
2006-11-28 17:38                                             ` Jon Loeliger
2006-11-29 16:15                                             ` Martin Waitz
2006-11-30 11:57                                             ` sf
     [not found]                                               ` <200611301255.41733.andyparkins@gmail.com>
2006-11-30 14:00                                                 ` Stephan Feder
2006-11-30 14:49                                                   ` Andy Parkins
2006-11-30 15:20                                                     ` Sven Verdoolaege
2006-11-30 15:30                                                       ` Andy Parkins
2006-11-30 15:50                                                         ` Andreas Ericsson
2006-11-30 16:08                                                           ` Andy Parkins
2006-11-30 16:33                                                         ` Sven Verdoolaege
2006-12-01  0:01                                                           ` Andy Parkins
2006-12-01  0:11                                                             ` Jakub Narebski
2006-12-01  9:32                                                             ` Sven Verdoolaege
2006-12-01 10:19                                                               ` Andy Parkins
2006-11-30 17:19                                                         ` Martin Waitz
2006-11-30 16:05                                                     ` sf
2006-11-30 16:12                                                       ` sf
2006-12-01  9:19                                                       ` Andy Parkins
2006-12-01  9:57                                                         ` Martin Waitz
2006-12-01 10:29                                                           ` Andy Parkins
2006-12-01 10:42                                                             ` Sven Verdoolaege
2006-12-01 11:02                                                               ` Andy Parkins
2006-12-01 11:10                                                                 ` Sven Verdoolaege
2006-12-01 11:45                                                                   ` sf
2006-12-01 12:12                                                                   ` Andy Parkins
2006-12-01 12:28                                                                     ` Martin Waitz
2006-12-01 14:11                                                                       ` Andy Parkins
2006-12-01 15:12                                                                         ` Martin Waitz
2006-12-01 11:46                                                                 ` Martin Waitz
2006-12-01 12:16                                                                   ` Andy Parkins
2006-12-01 12:34                                                                     ` Martin Waitz
2006-12-01 13:59                                                                       ` Andy Parkins
2006-12-01 14:07                                                                         ` Martin Waitz
2006-12-01 11:31                                                             ` Martin Waitz
2006-12-01 12:20                                                               ` Andy Parkins
2006-12-01 12:37                                                                 ` Martin Waitz
2006-12-02 15:16                                                                   ` Jakub Narebski
2006-11-28 19:58                                         ` Steven Grimm
2006-11-28 21:02                                           ` Shawn Pearce
2006-11-29 16:03                                         ` Martin Waitz
2006-11-29 20:00                                           ` Andy Parkins
2006-11-30 12:16                                             ` Andreas Ericsson
2006-11-30 12:40                                               ` Andy Parkins
2006-11-30 17:06                                             ` Martin Waitz
2006-11-30 18:57                                               ` Andreas Ericsson
2006-12-01  8:49                                                 ` Andy Parkins
2006-12-01  9:33                                                   ` Andreas Ericsson
2006-12-01 10:38                                                     ` Andy Parkins
2006-12-01 12:03                                                 ` sf
2006-12-01 12:11                                                   ` Martin Waitz
2006-12-01 13:21                                                     ` sf
2006-12-01 13:43                                                       ` Martin Waitz
2006-12-01 14:23                                                         ` Stephan Feder
2006-12-01 15:07                                                           ` Martin Waitz
2006-12-01 16:04                                                             ` Stephan Feder
2006-12-01 16:15                                                               ` Martin Waitz
2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
2006-12-05 10:33                                                   ` Andreas Ericsson
2006-12-05 11:11                                                     ` Jakub Narebski
2006-12-05 15:02                                                     ` Uwe Kleine-Koenig
2006-12-05 15:30                                                       ` Andreas Ericsson
2006-12-05 16:00                                                   ` Sven Verdoolaege
2006-12-01  9:02                                               ` Andy Parkins
2006-12-01 11:00                                                 ` Martin Waitz
2006-12-01 12:09                                                   ` sf
2006-12-01 12:12                                                     ` Martin Waitz
2006-12-01 13:05                                                       ` sf
2006-12-01 13:35                                                         ` Martin Waitz
2006-12-01 13:43                                                           ` Andreas Ericsson
2006-12-01 13:46                                                             ` Martin Waitz
2006-12-01 14:52                                                               ` Andreas Ericsson
2006-12-01 15:00                                                                 ` Martin Waitz
2006-12-01 16:38                                                                   ` Andreas Ericsson
2006-12-01 16:49                                                                     ` Linus Torvalds
2006-12-01 17:08                                                                       ` sf
2006-12-01 18:06                                                                         ` Andreas Ericsson
2006-12-01 20:13                                                                         ` Linus Torvalds
2006-12-01 20:30                                                                           ` Martin Waitz
2006-12-01 23:23                                                                             ` Alan Chandler
2006-12-01 22:06                                                                           ` Josef Weidendorfer
2006-12-01 22:12                                                                             ` Martin Waitz
2006-12-01 22:26                                                                               ` Josef Weidendorfer
2006-12-01 22:40                                                                                 ` Martin Waitz
2006-12-01 23:17                                                                               ` Josef Weidendorfer
2006-12-02 20:24                                                                                 ` Martin Waitz
2006-12-03  0:55                                                                                   ` Josef Weidendorfer
2006-12-03  6:29                                                                                     ` Martin Waitz
2006-12-01 22:26                                                                             ` Linus Torvalds
2006-12-01 22:41                                                                               ` sf
2006-12-01 23:03                                                                                 ` Josef Weidendorfer
2006-12-01 23:09                                                                                 ` Linus Torvalds
2006-12-01 23:36                                                                                   ` Josef Weidendorfer
2006-12-02  0:12                                                                                     ` Linus Torvalds
2006-12-02  9:22                                                                                       ` Andy Parkins
     [not found]                                                                                         ` <200612021255.59972.Josef.Weidendorfer@gmx.de>
2006-12-03  9:42                                                                                           ` Andy Parkins
2006-12-02 11:32                                                                                       ` Josef Weidendorfer
2006-12-02 19:52                                                                                         ` Linus Torvalds
2006-12-02 20:21                                                                                           ` Martin Waitz
2006-12-02 20:46                                                                                             ` Linus Torvalds
2006-12-02 20:58                                                                                               ` Martin Waitz
2006-12-03  1:11                                                                                                 ` Josef Weidendorfer
2006-12-02 20:18                                                                                       ` Martin Waitz
2006-12-02 20:44                                                                                         ` Linus Torvalds
2006-12-02 21:06                                                                                           ` Martin Waitz
2006-12-02 21:29                                                                                             ` Linus Torvalds
2006-12-02 21:22                                                                                           ` Linus Torvalds
2006-12-03  2:07                                                                                             ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Josef Weidendorfer
2006-12-03  2:25                                                                                               ` Linus Torvalds
2006-12-03  2:46                                                                                               ` Shawn Pearce
2006-12-03  3:21                                                                                                 ` Josef Weidendorfer
2006-12-03 11:10                                                                                                   ` Jakub Narebski
2006-12-03 11:47                                                                                                     ` Josef Weidendorfer
2006-12-03 20:46                                                                                           ` [RFC] Submodules in GIT Martin Waitz
2006-12-03 22:16                                                                                       ` Sven Verdoolaege
2006-12-03 22:32                                                                                         ` Linus Torvalds
2006-12-03 22:49                                                                                           ` Jakub Narebski
2006-12-04 11:12                                                                                           ` Josef Weidendorfer
2006-12-01 23:49                                                                                   ` sf
2006-12-02 18:57                                                                                     ` Torgil Svensson
2006-12-02 19:41                                                                                       ` Linus Torvalds
2006-12-03  9:19                                                                                         ` Torgil Svensson
2006-12-03 17:54                                                                                           ` Linus Torvalds
2006-12-04 20:26                                                                                             ` Torgil Svensson
2006-12-04 20:41                                                                                               ` Linus Torvalds
2006-12-04 21:36                                                                                                 ` Torgil Svensson
2006-12-05 10:42                                                                                                   ` Andreas Ericsson
2006-12-05 11:09                                                                                                     ` Jakub Narebski
2006-12-05 10:38                                                                                                 ` Andreas Ericsson
2006-12-05 11:01                                                                                                   ` Jakub Narebski
2006-12-03 19:33                                                                                         ` Andy Parkins
2006-12-05  2:33                                                                                         ` Daniel Barkalow
2006-12-05 22:07                                                                                           ` sf
2006-12-09 21:34                                                                                         ` R. Steve McKown
2006-12-10 11:47                                                                                           ` Torgil Svensson
2006-12-14 21:27                                                                                             ` Torgil Svensson
2006-12-14 23:07                                                                                               ` Josef Weidendorfer
2006-12-15 17:43                                                                                                 ` Torgil Svensson
2006-12-15 21:42                                                                                                   ` Josef Weidendorfer
2006-12-15 23:43                                                                                                     ` Torgil Svensson
2006-12-16  1:13                                                                                                       ` Torgil Svensson
2006-12-16  1:20                                                                                                         ` Torgil Svensson
2006-12-16  1:34                                                                                                           ` Jakub Narebski
2006-12-16  8:40                                                                                                             ` Torgil Svensson
2006-12-16  9:57                                                                                                               ` Jakub Narebski
2006-12-16 10:25                                                                                                                 ` Junio C Hamano
2006-12-16 15:05                                                                                                                 ` Torgil Svensson
2006-12-16 15:38                                                                                                                   ` Torgil Svensson
2006-12-16 16:32                                                                                                                   ` Jakub Narebski
2006-12-17  0:21                                                                                                                     ` Torgil Svensson
2006-12-16  1:49                                                                                                         ` Linus Torvalds
2006-12-16  2:12                                                                                                           ` Linus Torvalds
2006-12-16  8:50                                                                                                             ` Torgil Svensson
2006-12-02 20:12                                                                                   ` Martin Waitz
2006-12-01 22:55                                                                               ` Josef Weidendorfer
2006-12-01 23:07                                                                                 ` Martin Waitz
2006-12-01 23:30                                                                                 ` Linus Torvalds
2006-12-02  0:14                                                                                   ` Josef Weidendorfer
2006-12-02  0:33                                                                                     ` Linus Torvalds
2006-12-02  9:27                                                                                       ` Andy Parkins
2006-12-04 18:56                                                                                       ` Michael K. Edwards
2006-12-05  1:31                                                                                         ` Sam Vilain
2006-12-01 22:35                                                                           ` sf
2006-12-08 18:29                                                                           ` Jon Loeliger
2006-12-08 18:45                                                                             ` Sven Verdoolaege
2006-12-12  8:32                                                                             ` Andreas Ericsson
2006-12-01 17:14                                                                       ` Martin Waitz
2006-12-01 16:57                                                                     ` Martin Waitz
2006-12-01 18:08                                                                       ` Andreas Ericsson
2006-12-01 18:51                                                                         ` Martin Waitz
2006-12-01 13:51                                                           ` Stephan Feder
2006-12-01 14:58                                                             ` Martin Waitz
2006-12-01 15:47                                                               ` Stephan Feder
2006-12-01 16:54                                                                 ` Martin Waitz
2006-12-01 17:33                                                                   ` Stephan Feder
2006-12-01 18:48                                                                     ` Martin Waitz
2006-12-01 23:34                                                                       ` sf
2006-12-02 19:46                                                                         ` Martin Waitz
2006-12-01 19:17                                                                     ` Andy Parkins
2006-12-01 19:38                                                                       ` Martin Waitz
2006-12-01 21:04                                                                         ` Andy Parkins
2006-12-01 21:37                                                                           ` Martin Waitz
2006-12-01 21:54                                                                             ` Andy Parkins
2006-12-01 22:08                                                                               ` Martin Waitz
2006-12-02 10:04                                                                                 ` Andy Parkins
2006-12-02 13:50                                                                                   ` Josef Weidendorfer
2006-12-02 20:43                                                                                     ` Martin Waitz
2006-12-03  1:02                                                                                       ` Josef Weidendorfer
2006-12-02 20:40                                                                                   ` Martin Waitz
2006-12-02 13:14                                                                           ` Jakub Narebski
2006-12-02 13:08                                                                     ` Jakub Narebski
2006-12-02 12:48                                                   ` Jakub Narebski
2006-11-28 17:28                                   ` Daniel Barkalow
2006-11-28 18:08                                     ` Sven Verdoolaege
2006-11-28 18:37                                       ` Daniel Barkalow
2006-11-28 19:06                                         ` Sven Verdoolaege
2006-11-28 20:41                                           ` Daniel Barkalow
2006-11-28 21:10                                             ` Shawn Pearce
2006-11-28 21:32                                               ` Daniel Barkalow
2006-11-28 21:53                                                 ` Linus Torvalds
2006-11-20 22:49 ` Jakub Narebski
2006-11-21  7:21 ` Shawn Pearce
2006-11-22  5:29 ` Petr Baudis
2006-12-02 20:16 ` Jakub Narebski
2006-12-03  1:24   ` Robin Rosenberg
2006-12-03  1:31     ` Jakub Narebski
2006-12-03 12:22       ` Robin Rosenberg
2006-12-03 12:31         ` Jakub Narebski
2006-12-03 11:00     ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).