Subprojects tasks

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Subprojects tasks
@ 2006-12-16 18:32 Junio C Hamano
  2006-12-16 18:45 ` Jakub Narebski
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Junio C Hamano @ 2006-12-16 18:32 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git

Because I am primarily a plumber, I was thinking about the
changes that need to be done at the plumbing level.  I only
looked at the prototype when it was announced, and I do not know
the progress you made since then.  Could you tell us the current
status?

I am assuming that the overall design is based on what Linus
proposed long time ago with his "gitlink" object.  That is,

 * the index and the tree object for the superproject has a
   "link" object that tells there is a directory and the
   corresponding commit object name from the subproject.  Unlike
   my previous "bind commit" based prototype, index does not
   have any blobs nor trees from the subproject.

 * the subproject is on its own, and can exist unaware of the
   existence of its superproject (there is no back-link at the
   object layer).

 * the subproject and the superproject are loosely coupled.  An
   act of committing in one does not automatically make a
   corresponding commit in the other at the plumbing level.

For now, I assume that the representation of the "link" object
is (after the usual object header) 40-byte hexadecimal SHA-1 of
the commit object, plus a LF, but that is a minor detail.

At the object layer, obviously you would need a new object type
allocated, "link", and the mode-bits assigned in the tree
object.  You should be able to cover cat-file, fsck-objects
(connectivity through link->commit), pack-objects and
unpack-objects (they need to know about the new object type) at
that stage.

With the index, in addition to the mode bits and "link" object's
SHA-1, you would need to decide what to do with ce_match_stat(),
to keep track of the information "update-index --refresh" updates.
My recommendation is to:

 (1) Look at the directory the "link" is at, and find .git/
     subdirectory (that is the $GIT_DIR for the subproject) and
     its .git/HEAD;

 (2) If that points at a loose ref, use the file's stat()
     information (e.g stat("$sub/.git/refs/heads/master"));

 (3) Otherwise, use the packed-ref file's stat() information
     (e.g stat("$sub/.git/packed-refs")).

Then ce_match_stat() for a "link" entry can do the same
computation and tell if the subproject has changed its HEAD.

I think "update-index --add $directory" should check if .git/
exists and looks like a valid repository, and make a "link"
object out of "$directory/.git/HEAD".

Another issue with the index is what to put in the cache_tree
structure; I think "link" can be treated just like blob (both
files and symlinks).

Then read-tree (bulk of it is in unpack-trees.c) needs to be
taught to read in "link" and put that into the index -- this
should be straightforward.

After you have a working index, you should be able to do
write-tree (writes the new "link" entry as is, without
descending into the subproject) trivially.

It is debatable what 'checkout-index -f' should do when the
subproject is already checked out and its HEAD points at a
different commit.  I am tempted to say that it should go there
and run "reset --hard", but I feel uneasy about that because it
is a blatant layering violation.  Maybe it should simply ignore
link entries and let the Porcelain layer take care of them.

Then there are three diff- brothers at the plumbing level.  I
think it is reasonable not to make them recurse into "link",
even with the presense of -r (recursive) option (Porcelain "git
diff" might want to recurse into the subproject, perhaps with a
new --recurse-harder option, though).

That means diff-files either skips a "link" entry if
ce_match_stat() says it is clean, or feeds "link" and its
recorded SHA-1 from the index on the left hand side, and 0{40}
on the right hand side with "link" type (after verifying that
"$sub/.git" is still there -- otherwise you would say that the
working tree has lost that subproject).  "Read from the working
tree" done for diff-files for a "link" object would grab the
commit SHA-1 from the tip of the current branch of the
subproject, format it as the value of a "link" object (I am
assuming just a 40-byte commit SHA-1, plus a LF) and would
compare that with the result of read_sha1_file() on the link
object recorded in the index when producing -p (patch) output.

diff-tree would compare "link" entries without descending into
them.  "link" and "blob" would compare just like "symlink" and
"blob" would compare.

diff-index with --cached would work like diff-tree (two concrete
SHA-1), and without --cached would work like diff-files (one
SHA-1 from the tree, another is either from the index if
ce_match_stat() says it is clean, 0{40} otherwise).

I suspect the hardest part is "rev-list --objects" (now most of
it is found in revision.c).  Theoretically, if the code can
handle "tag"s, it should be able to handle "link"s, but I have a
feeling that the ancestry traversal code that walks commits is
not prepared to see "commit" object to appear from somewhere in
the middle of traversal.  A commit so far can be wrapped only by
tags zero or more times, and a tag never appears inside anything
but another tag, so the code can just keep peeling the tag until
it sees a non-tag and after that it will be living in the world
that has only commit->tree->blob hierarchy, and can afford to do
the ancestry based solely on "commit" and can treat reachability
for "tree" and "blob" as afterthought.  But I think the updated
code needs to know that "link" needs to be unwrapped and
contained "commit" needs to be injected back to the ancestry
walking machinery.

Once you have "rev-list --objects", you should be able to drive
pack-objects with its output.  I do not think there is much to
change in that program.

My gut feeling is that it would take about 2-3 weeks for a
competent plumber working on full time to make the above changes
to the plumbing side into presentable shape.

On the Porcelain side, you would need policy design, some of
which were discussed on the list, such as what committing and
fetching in a superproject mean and should do.  I do not have a
guestimate of the amount of work that would involve.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 18:32 Subprojects tasks Junio C Hamano
@ 2006-12-16 18:45 ` Jakub Narebski
  2006-12-16 23:01   ` Martin Waitz
  2006-12-16 20:35 ` Sven Verdoolaege
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Jakub Narebski @ 2006-12-16 18:45 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

>  (1) Look at the directory the "link" is at, and find .git/
>      subdirectory (that is the $GIT_DIR for the subproject) and
>      its .git/HEAD;

Or .gitlink file, if we decide to implement it (as lightweight checkout and
support for submodules which one can easily move/rename).
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 18:32 Subprojects tasks Junio C Hamano
  2006-12-16 18:45 ` Jakub Narebski
@ 2006-12-16 20:35 ` Sven Verdoolaege
  2006-12-16 21:07   ` Junio C Hamano
  2006-12-17  8:48 ` Alan Chandler
  2006-12-17 11:17 ` Martin Waitz
  3 siblings, 1 reply; 24+ messages in thread
From: Sven Verdoolaege @ 2006-12-16 20:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Waitz, git

On Sat, Dec 16, 2006 at 10:32:36AM -0800, Junio C Hamano wrote:
> I suspect the hardest part is "rev-list --objects" (now most of
> it is found in revision.c).  [..]  But I think the updated
> code needs to know that "link" needs to be unwrapped and
> contained "commit" needs to be injected back to the ancestry
> walking machinery.

Do we want "link" to be unwrapped, though ?

> Once you have "rev-list --objects", you should be able to drive
> pack-objects with its output.

Wouldn't we then run into the scalability problems Linus was
concerned about ?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 20:35 ` Sven Verdoolaege
@ 2006-12-16 21:07   ` Junio C Hamano
  2006-12-16 22:58     ` Martin Waitz
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2006-12-16 21:07 UTC (permalink / raw)
  To: skimo; +Cc: Martin Waitz, git

Sven Verdoolaege <skimo@kotnet.org> writes:

> On Sat, Dec 16, 2006 at 10:32:36AM -0800, Junio C Hamano wrote:
>> I suspect the hardest part is "rev-list --objects" (now most of
>> it is found in revision.c).  [..]  But I think the updated
>> code needs to know that "link" needs to be unwrapped and
>> contained "commit" needs to be injected back to the ancestry
>> walking machinery.
>
> Do we want "link" to be unwrapped, though ?
>
>> Once you have "rev-list --objects", you should be able to drive
>> pack-objects with its output.
>
> Wouldn't we then run into the scalability problems Linus was
> concerned about ?

Hmph.

If the plumbing layer does not have to (although I haven't
thought it through, it does feel like it even shouldn't) unwrap
"link" and let the Porcelain layer to deal with it, that would
certainly make rev-list/revision.c part simpler.

I like it.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 21:07   ` Junio C Hamano
@ 2006-12-16 22:58     ` Martin Waitz
  2006-12-16 23:14       ` Sven Verdoolaege
  2006-12-17  0:32       ` Josef Weidendorfer
  0 siblings, 2 replies; 24+ messages in thread
From: Martin Waitz @ 2006-12-16 22:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: skimo, git

[-- Attachment #1: Type: text/plain, Size: 1769 bytes --]

hoi :)

Junio, I'll take a more detailed look at your mail tomorrow, after I
regenerated from all the Guinness I had tonight ;-)

On Sat, Dec 16, 2006 at 01:07:04PM -0800, Junio C Hamano wrote:
> Sven Verdoolaege <skimo@kotnet.org> writes:
> > On Sat, Dec 16, 2006 at 10:32:36AM -0800, Junio C Hamano wrote:
> >> I suspect the hardest part is "rev-list --objects" (now most of
> >> it is found in revision.c).  [..]  But I think the updated
> >> code needs to know that "link" needs to be unwrapped and
> >> contained "commit" needs to be injected back to the ancestry
> >> walking machinery.

Well, I already got to the point of using the commit directly,
instead of any link object.  It even worked with rev-list --objects
in all my test cases.  That is, I could correctly clone/pack/pull
the complete project including all modules.

> > Wouldn't we then run into the scalability problems Linus was
> > concerned about ?

This is a real problem.

> If the plumbing layer does not have to (although I haven't
> thought it through, it does feel like it even shouldn't) unwrap
> "link" and let the Porcelain layer to deal with it, that would
> certainly make rev-list/revision.c part simpler.

Yes.  However, it makes other things more complicated.
If the plumbing does not do all the subproject stuff and you don't have
everything in one database it is much more difficult to really get
a consistent database when cloning or fetching (you have to get even old
submodule commits which are not reachable by the current supermodule
tree anymore, perhaps even submodules which do not exist anymore).

I did not have much time to think about these issues in the last day and
am not yet convinced on how to proceed,

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 18:45 ` Jakub Narebski
@ 2006-12-16 23:01   ` Martin Waitz
  2006-12-16 23:15     ` Jakub Narebski
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Waitz @ 2006-12-16 23:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

hoi :)

On Sat, Dec 16, 2006 at 07:45:11PM +0100, Jakub Narebski wrote:
> Or .gitlink file, if we decide to implement it (as lightweight checkout and
> support for submodules which one can easily move/rename).

I still don't get the advantage of a .gitlink file over an ordinary
repository with alternates or a symlink.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 22:58     ` Martin Waitz
@ 2006-12-16 23:14       ` Sven Verdoolaege
  2006-12-17  0:32       ` Josef Weidendorfer
  1 sibling, 0 replies; 24+ messages in thread
From: Sven Verdoolaege @ 2006-12-16 23:14 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Junio C Hamano, git

On Sat, Dec 16, 2006 at 11:58:10PM +0100, Martin Waitz wrote:
> On Sat, Dec 16, 2006 at 01:07:04PM -0800, Junio C Hamano wrote:
> > Sven Verdoolaege <skimo@kotnet.org> writes:
> > > On Sat, Dec 16, 2006 at 10:32:36AM -0800, Junio C Hamano wrote:
> > >> I suspect the hardest part is "rev-list --objects" (now most of
> > >> it is found in revision.c).  [..]  But I think the updated
> > >> code needs to know that "link" needs to be unwrapped and
> > >> contained "commit" needs to be injected back to the ancestry
> > >> walking machinery.
> 
> Well, I already got to the point of using the commit directly,
> instead of any link object.

I think Junio is simply refering to the type of the object as represented
in a tree and that the value would indeed just be the commit hash, as in
your implementation.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 23:01   ` Martin Waitz
@ 2006-12-16 23:15     ` Jakub Narebski
  2006-12-17  0:01       ` Josef Weidendorfer
  2006-12-17  0:08       ` Josef Weidendorfer
  0 siblings, 2 replies; 24+ messages in thread
From: Jakub Narebski @ 2006-12-16 23:15 UTC (permalink / raw)
  To: Martin Waitz; +Cc: git, Josef Weidendorfer

Hi!

Martin Waitz wrote:
> On Sat, Dec 16, 2006 at 07:45:11PM +0100, Jakub Narebski wrote:
>>
>> Or .gitlink file, if we decide to implement it (as lightweight checkout and
>> support for submodules which one can easily move/rename).
> 
> I still don't get the advantage of a .gitlink file over an ordinary
> repository with alternates or a symlink.

Moving or renaming the directory with a submodule. With alternates,
when you rename or move directory with a submodule, you have to add
alternate for new place / new name, or alter existing alternate.
With symlinks you risk broken symlinks.

When using alternates-like modules file, you can regenerate or
generate "alternates" on checkout, but...

With .gitlink file you can specify GIT_DIR sor submodule as given
directory relative to this directory or one of its parents, so you
can rename and move submodules freely.

P.S. The second (first?) purpose of .gitlink is to be able to have
lightweight checkout, i.e. more than one working area associated with
one repository.

P.P.S. Cc to the author of current .gitlink proposal, to Josef
Weidendorfer.
  Message-ID: <200612082252.31245.Josef.Weidendorfer@gmx.de>
  http://permalink.gmane.org/gmane.comp.version-control.git/33755
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 23:15     ` Jakub Narebski
@ 2006-12-17  0:01       ` Josef Weidendorfer
  2006-12-17 11:45         ` Martin Waitz
  2006-12-17  0:08       ` Josef Weidendorfer
  1 sibling, 1 reply; 24+ messages in thread
From: Josef Weidendorfer @ 2006-12-17  0:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Martin Waitz, git

On Sunday 17 December 2006 00:15, Jakub Narebski wrote:
> Hi!
> 
> Martin Waitz wrote:
> > On Sat, Dec 16, 2006 at 07:45:11PM +0100, Jakub Narebski wrote:
> >>
> >> Or .gitlink file, if we decide to implement it (as lightweight checkout and
> >> support for submodules which one can easily move/rename).
> > 
> > I still don't get the advantage of a .gitlink file over an ordinary
> > repository with alternates or a symlink.
> 
> Moving or renaming the directory with a submodule. With alternates,
> when you rename or move directory with a submodule, you have to add
> alternate for new place / new name, or alter existing alternate.
> With symlinks you risk broken symlinks.

Yes.

IMHO it simply is added flexibility to allow a checkout to be separate from
the .git/ directory, same as explicitly setting $GIT_DIR would do.
So this .gitlink file is on the one hand one kind of convenience for users
which want to keep their repository separate, yet do not want to specify
$GIT_DIR all the time in front of git commands.
The .gitlink file simply makes the linkage to the separate repository
persistent.

In the scope of submodules, you get the benefit that you can not lose
submodule repositories by doing a "rm -rf *" (or similar, e.g. deleting
dirs with submodules in it) in the supermodule checkout. Actually, the
latter is a valid action: delete a submodule in the next commit;
when going back at an earlier commit, the submodule should be there again.
So IMHO you allow far more possibilities by separating GITDIR from the
checkout of submodules.

However. I think that this .gitlink file proposal can be seen as kind
of independent from submodule support at first; it should be easy to
make this work together later on. E.g. submodule root directories
can be easily detected when they have a .gitlink file (instead of
.git/ directory), and so on.

This said, I started implementing it, but do not have anything useful
to show yet.
Some issues:
* Probably, it is better to go with a _file_ .git instead of a file
.gitlink, as this way, the user is forced to either go with the
git repository in _directory_ .git/ or external linkage with
the _file_ ".git".
* Even when a .gitlink file is detected, we should honor a
$GIT_DIR environment variable set by the user. Unfortunately, $GIT_DIR
also can be set by porcelain commands to specify "this command only
works in the toplevel directory of a git checkout", i.e. these
porcelain commands set GIT_DIR to ".git". IMHO this is a hack, and
we explicitly should tell the plumbing about these need e.g. via another
environment variable (or a option) without implicitly forcing it by
setting $GIT_DIR. 
* In the way to make the .gitlink file as flexible as
possible (and to use it for lightweight checkouts), it really should
support $GIT_HEAD_FILE, which would replace "HEAD" with the content
of $GIT_HEAD_FILE. E.g. with GIT_HEAD_FILE=MYHEAD, the command
"git log HEAD" really should internally work as "git log MYHEAD"
(ie. use the .git/MYHEAD file instead). It is arguable whether the
usage of "ORIG_HEAD" by the user or in porcelain should map to file
"ORIG_MYHEAD". Probably not. However, changing this in all places
is some work, and I assume that therefore nobody has ever implemented
$GIT_HEAD_FILE - which IMHO really would be useful by itself. 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 23:15     ` Jakub Narebski
  2006-12-17  0:01       ` Josef Weidendorfer
@ 2006-12-17  0:08       ` Josef Weidendorfer
  1 sibling, 0 replies; 24+ messages in thread
From: Josef Weidendorfer @ 2006-12-17  0:08 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Martin Waitz, git

On Sunday 17 December 2006 00:15, Jakub Narebski wrote:
> > I still don't get the advantage of a .gitlink file over an ordinary
> > repository with alternates or a symlink.

Forgot one thing:
To separate the repository files from a checkout, a symlink is not
enough, as you lose the linkage when you move the checkout or the
repository; you could use an absolute symlink target, but that also
has inconveniences.

So you want some kind of smart linking. And this is another
important part of the .gitlink file proposal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 22:58     ` Martin Waitz
  2006-12-16 23:14       ` Sven Verdoolaege
@ 2006-12-17  0:32       ` Josef Weidendorfer
  1 sibling, 0 replies; 24+ messages in thread
From: Josef Weidendorfer @ 2006-12-17  0:32 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Junio C Hamano, skimo, git

On Saturday 16 December 2006 23:58, Martin Waitz wrote:
> > If the plumbing layer does not have to (although I haven't
> > thought it through, it does feel like it even shouldn't) unwrap
> > "link" and let the Porcelain layer to deal with it, that would
> > certainly make rev-list/revision.c part simpler.
> 
> Yes.  However, it makes other things more complicated.
> If the plumbing does not do all the subproject stuff and you don't have
> everything in one database

Even without plumbing doing the subproject stuff, we could use the
same, unified database for the objects. Or do I miss something?

As you said: the problem are submodule commit in superproject trees which
are not reachable by refs of the submodule. However, we only need these
commits when cloning/fetching the submodule in the scope of cloning/fetching
the superproject; we simply can not use here a normal repository of the
submodule, as these commits would be not available there.

We should add a plumbing command for "Give me the minimal set of commits
(from all submodules) which have all the submodule link object ids as ancestors
which appear in the history of a given commit (from a superproject)".
With this, building the set of objects to pack/fetch/clone into a unified
object database for a superproject with its submodules should be easy.

It is also needed for pruning the unified object database.
Pruning in submodules simply would print out an error "Pruning in submodules
not supported. Prune in the superproject instead".

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 18:32 Subprojects tasks Junio C Hamano
  2006-12-16 18:45 ` Jakub Narebski
  2006-12-16 20:35 ` Sven Verdoolaege
@ 2006-12-17  8:48 ` Alan Chandler
  2006-12-17 10:01   ` Jakub Narebski
  2006-12-17 11:17 ` Martin Waitz
  3 siblings, 1 reply; 24+ messages in thread
From: Alan Chandler @ 2006-12-17  8:48 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Waitz

On Saturday 16 December 2006 18:32, Junio C Hamano wrote:
> Because I am primarily a plumber, I was thinking about the
> changes that need to be done at the plumbing level.  I only
> looked at the prototype when it was announced, and I do not know
> the progress you made since then.  Could you tell us the current
> status?
>
> I am assuming that the overall design is based on what Linus
> proposed long time ago with his "gitlink" object.  That is,
>
>  * the index and the tree object for the superproject has a
>    "link" object that tells there is a directory and the
>    corresponding commit object name from the subproject.  Unlike
>    my previous "bind commit" based prototype, index does not
>    have any blobs nor trees from the subproject.
>
>  * the subproject is on its own, and can exist unaware of the
>    existence of its superproject (there is no back-link at the
>    object layer).

I have been following the submodules (subprojects - is there any 
difference?) discussion from afar, getting lost quite frequently in 
what is actually being discussed and why.  I don't think the idea I 
express below has been mentioned, but apologies if it has.

One element I felt has been missing is a vigorous discussion of what 
submodules are for and what are their use cases.  The "submodule is on 
its own" issue seems to have crept into the discussion - but there was 
one use case that was discussed, where some actually help by the 
submodule could be useful.

The use case was when the supermodule wanted to make use of the header 
files of the submodule because it was using the submodule as a library.

This did make me wonder if the submodule should not export some form 
of "approved" set of content (or files - and I do think care is needed 
here as to which it is when we think about renames) which is both

a) a subset of the full tree that is stored at commit time, and
b) does itself have a commit history 

(I am clearly thinking that would be the standard "include" files, but 
not the actual source of the library - (but it might include the 
library it self as a prebuilt binary library?)

This does suggest it is a tree object stored in the repository - and 
that it is linked in time via a set of commit objects - I'll call them 
the "export commits".  I am not sure whether a new commit should be 
made everytime there is any change (via a normal commit) to this 
content, or (and I slightly favour this) there is a new commit made 
which is somewhat akin to a tag when the project wants to release a new 
version of its interface. 

Supermodules, which then made use of that library would, the do some 
form of shallow clone, shallow in the sense that it only pulled in the 
exported commit content and also (possibly) shallow in the sense that 
it does not need to go back in time to get old versions of the exported 
commit.

-- 
Alan Chandler

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17  8:48 ` Alan Chandler
@ 2006-12-17 10:01   ` Jakub Narebski
  0 siblings, 0 replies; 24+ messages in thread
From: Jakub Narebski @ 2006-12-17 10:01 UTC (permalink / raw)
  To: git

Alan Chandler wrote:

> The use case was when the supermodule wanted to make use of the header 
> files of the submodule because it was using the submodule as a library.
> 
> This did make me wonder if the submodule should not export some form 
> of "approved" set of content (or files - and I do think care is needed 
> here as to which it is when we think about renames) which is both
> 
> a) a subset of the full tree that is stored at commit time, and
> b) does itself have a commit history 
> 
> (I am clearly thinking that would be the standard "include" files, but 
> not the actual source of the library - (but it might include the 
> library it self as a prebuilt binary library?)
> 
> This does suggest it is a tree object stored in the repository - and 
> that it is linked in time via a set of commit objects - I'll call them 
> the "export commits".  I am not sure whether a new commit should be 
> made everytime there is any change (via a normal commit) to this 
> content, or (and I slightly favour this) there is a new commit made 
> which is somewhat akin to a tag when the project wants to release a new 
> version of its interface. 

In the absence of sparse/partial checkout, and it's use in submodule
support, this can be solvd purely on porcelain level.

You would have to simply maintain separate 'includes' branch, similarly
to how 'html' and 'man' (and 'todo') branches are maintained in git.git
repository -- it would be your 'set of commit objects'. Then the only
think that would be needed is some commit / post-commit hook which would
examine if commit touches "include" files and if it does, make a commit
in the 'includes' ('inc' for short) branch.

Suportmodule would then use either 'master' branch for full sources,
or 'includes' branch for headers only.

P.S. Cc: Alan Chandler <alan@chandlerfamily.org.uk>, 
Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-16 18:32 Subprojects tasks Junio C Hamano
                   ` (2 preceding siblings ...)
  2006-12-17  8:48 ` Alan Chandler
@ 2006-12-17 11:17 ` Martin Waitz
  3 siblings, 0 replies; 24+ messages in thread
From: Martin Waitz @ 2006-12-17 11:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 7963 bytes --]

hoi :)

On Sat, Dec 16, 2006 at 10:32:36AM -0800, Junio C Hamano wrote:
> Because I am primarily a plumber, I was thinking about the
> changes that need to be done at the plumbing level.  I only
> looked at the prototype when it was announced, and I do not know
> the progress you made since then.  Could you tell us the current
> status?

Most of the things you described are already implemented in
http://git.admingilde.org/tali/git.git/module2

If there is interest in it, I can generate some nice patches out of it.
However with Linus concerns about scalability I'm not sure it is ready
yet.  But if you prefer patches for discussion I'll send them here.

> I am assuming that the overall design is based on what Linus
> proposed long time ago with his "gitlink" object.  That is,
> 
>  * the index and the tree object for the superproject has a
>    "link" object that tells there is a directory and the
>    corresponding commit object name from the subproject.  Unlike
>    my previous "bind commit" based prototype, index does not
>    have any blobs nor trees from the subproject.

In contrast to your description, my implementation does not
introduce a new "link" type but instead adds the reference to the
submodule commit directly to the parent tree object and to its
index.

>  * the subproject is on its own, and can exist unaware of the
>    existence of its superproject (there is no back-link at the
>    object layer).

yes, this is essential.
There may be links in this particular instance of the submodule, i.e.
the repository/working directory which are checked out by the
supermodule may be coupled to the supermodule, but it must always
be possible to clone/push/pull the submodule alone.

>  * the subproject and the superproject are loosely coupled.  An
>    act of committing in one does not automatically make a
>    corresponding commit in the other at the plumbing level.

this is essential, too.

> With the index, in addition to the mode bits and "link" object's
> SHA-1, you would need to decide what to do with ce_match_stat(),
> to keep track of the information "update-index --refresh" updates.
> My recommendation is to:
> 
>  (1) Look at the directory the "link" is at, and find .git/
>      subdirectory (that is the $GIT_DIR for the subproject) and
>      its .git/HEAD;
> 
>  (2) If that points at a loose ref, use the file's stat()
>      information (e.g stat("$sub/.git/refs/heads/master"));
> 
>  (3) Otherwise, use the packed-ref file's stat() information
>      (e.g stat("$sub/.git/packed-refs")).
> 
> Then ce_match_stat() for a "link" entry can do the same
> computation and tell if the subproject has changed its HEAD.

yes.
However I decicided not to read in HEAD but some specific branch.
This may sound arbitrary and I did not really like to make
"master" (the branch I chose) even more special, but you will
understand it when looking at the checkout below.

> Another issue with the index is what to put in the cache_tree
> structure; I think "link" can be treated just like blob (both
> files and symlinks).

Hmm, I never cared about cache_tree up to now.  I guess I should learn
about it to understand the influence on submodules.

> Then read-tree (bulk of it is in unpack-trees.c) needs to be
> taught to read in "link" and put that into the index -- this
> should be straightforward.
> 
> After you have a working index, you should be able to do
> write-tree (writes the new "link" entry as is, without
> descending into the subproject) trivially.

Where do you want to write the link to?
What I do here is update one branch ("master") of the submodule to
the new commit which was stored in the parent index.
If this branch is currently checked out, the working directory will
be updated, too.  If there is no working directory for the submodule
yet, it will be created.

Updating one special branch instead of HEAD is because the submodule
commits which are stored in the supermodule really can be considered
as a special branch which happens to not be stored in an ordinary ref.
In order to make it visible to the user the commit is copied to a
normal ref.
This approach also integrates better with branches in the submodule.
When you want to start parallel development in a branch you eigther
want to do this in the complete supermodule scope -- then you have
to branch the supermodule --, or you want to do it independent to the
version stored in the supermodule -- then you don't want a supermodule
checkout to mess with your branch.
So it makes sense to have one branch which is tracked by the parent
and other branches which are independent from the parent.

> It is debatable what 'checkout-index -f' should do when the
> subproject is already checked out and its HEAD points at a
> different commit.  I am tempted to say that it should go there
> and run "reset --hard", but I feel uneasy about that because it
> is a blatant layering violation.  Maybe it should simply ignore
> link entries and let the Porcelain layer take care of them.

Where exactly do you see the layering violation?
Well I think it makes sense to use read-tree -m <old> <new> in the
submodule instead of a hard reset, but when the supermodule is checked
out the submodule really should move to its new version.
(At least the branch which is tracked by the parent should do so.)

> Then there are three diff- brothers at the plumbing level.

All the diff stuff is what is still missing in my implementation.
If you ask for a diff in the parent, it will happily diff the
submodules commit objects ;-)

> I suspect the hardest part is "rev-list --objects" (now most of
> it is found in revision.c).  Theoretically, if the code can
> handle "tag"s, it should be able to handle "link"s, but I have a
> feeling that the ancestry traversal code that walks commits is
> not prepared to see "commit" object to appear from somewhere in
> the middle of traversal.  A commit so far can be wrapped only by
> tags zero or more times, and a tag never appears inside anything
> but another tag, so the code can just keep peeling the tag until
> it sees a non-tag and after that it will be living in the world
> that has only commit->tree->blob hierarchy, and can afford to do
> the ancestry based solely on "commit" and can treat reachability
> for "tree" and "blob" as afterthought.  But I think the updated
> code needs to know that "link" needs to be unwrapped and
> contained "commit" needs to be injected back to the ancestry
> walking machinery.

Well, a simple and dump version (i.e. my current implementation) can
just do the same for commits as it does for trees: just recursively
descend.  Of course this is prohibitive in anything but toy projects.

A better approach is to put all the submodule commits on the pending
list and do the normal ancestry walk for them again.  But this would
also need all reachable objects from all modules to be known to one
process.

This could be solved by having one pending list per submodule and
then flush all objects before moving to the next submodule, or
just processing the submodule in a different process.
But when the SEEN information is not shared between submodules then
rev-list could output the same object twice if a blob or tree is
used by several submodules.  This may not be a problem if all the
code which processes rev-list output is idempotent, but I haven't
looked into this in detail.

Of course, when rev-list for submodules is already split out there
is the valid question if it really makes sense to descend into
submodules when doing rev-list.
Not doing so would natually decouple sub- from supermodule but then
a lot of operations that depend on rev-list (clone, push, pull)
have to be heavily modified.

Getting this straight in an efficient way is the next challenge.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17  0:01       ` Josef Weidendorfer
@ 2006-12-17 11:45         ` Martin Waitz
  2006-12-17 13:01           ` Jakub Narebski
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Waitz @ 2006-12-17 11:45 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Jakub Narebski, git

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

hoi :)

On Sun, Dec 17, 2006 at 01:01:09AM +0100, Josef Weidendorfer wrote:
> IMHO it simply is added flexibility to allow a checkout to be separate from
> the .git/ directory, same as explicitly setting $GIT_DIR would do.
> So this .gitlink file is on the one hand one kind of convenience for users
> which want to keep their repository separate, yet do not want to specify
> $GIT_DIR all the time in front of git commands.
> The .gitlink file simply makes the linkage to the separate repository
> persistent.

I can see the reason for wanting to use another object database,
but HEAD and index should always be stored together with the
checked out directory.  So perhaps we just need some smart way to
search for the object database, but keep the .git directory.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 11:45         ` Martin Waitz
@ 2006-12-17 13:01           ` Jakub Narebski
  2006-12-17 13:48             ` Martin Waitz
  0 siblings, 1 reply; 24+ messages in thread
From: Jakub Narebski @ 2006-12-17 13:01 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, git, Junio C Hamano

Martin Waitz wrote:
> On Sun, Dec 17, 2006 at 01:01:09AM +0100, Josef Weidendorfer wrote:

>> IMHO it simply is added flexibility to allow a checkout to be separate from
>> the .git/ directory, same as explicitly setting $GIT_DIR would do.
>> So this .gitlink file is on the one hand one kind of convenience for users
>> which want to keep their repository separate, yet do not want to specify
>> $GIT_DIR all the time in front of git commands.
>> The .gitlink file simply makes the linkage to the separate repository
>> persistent.
> 
> I can see the reason for wanting to use another object database,
> but HEAD and index should always be stored together with the
> checked out directory.  So perhaps we just need some smart way to
> search for the object database, but keep the .git directory.

Well, in the .gitlink proposal you could specify GIT_DIR for checkout,
or separately: GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE, GIT_REFS_DIRECTORY
(does not exist yet), GIT_HEAD_FILE (does not exist yet, and I suppose
it wouldn't be easy to implement it). By the way, that's why I'm for
.gitlink name for the file, not .git -- this way .gitlink can "shadow"
what's in .git, for example specifying in a smart way where to search
(where to find) object database, but HEAD and index would be stored
together with the checked out directory in .git

By the way, I'm rather partial to supermodule following HEAD in submodule,
not specified branch. First, I think it is easier from implementation
point of view: you don't have to remember which branch supermodule should
take submodule commits from; and this cannot be fixed branch name like
'master'. For example 'maint' branch of supermodule could track 'maint'
branch of submodule, 'master' branch of supermodule track 'master'
branch of submodule, 'next' branch of supermodule tranck 'master' (!)
branch of submodule, 'pu' branch of supermodule track 'next' (!) branch
of submodule. 

Second, if you want to do some independent work on the module not related
to work on submodule you should really clone (clone -l -s) submodule
and work in separate checkout; the complaint that with tracking HEAD
you can check-in wrong version of submodule to supermodule commit
doesn't hold, because you still would have problem that _tree_
of supermodule would have wrong version of submodule. And moving to
using single defined branch of submodule brings multitude of other
problems: for example you might usually track 'master' version of
submodule, but for a short time need to track 'next' branch because
it has functionality you need; and another time you need to move
to 'maint' branch or even your own branch because 'master' version
breaks something in supermodule.

Hmmm... I wonder how planned allowing to checking out tags, non-head
branches (e.g. tracking/remote branches) and arbitrary commits but
forbidding committing when HEAD is not a refs/heads/ branch would
affect submodules / subprojects...

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 13:01           ` Jakub Narebski
@ 2006-12-17 13:48             ` Martin Waitz
  2006-12-17 14:29               ` Jakub Narebski
  2006-12-17 23:23               ` Josef Weidendorfer
  0 siblings, 2 replies; 24+ messages in thread
From: Martin Waitz @ 2006-12-17 13:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Josef Weidendorfer, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 6154 bytes --]

hoi :)

On Sun, Dec 17, 2006 at 02:01:09PM +0100, Jakub Narebski wrote:
> Well, in the .gitlink proposal you could specify GIT_DIR for checkout,
> or separately: GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE, GIT_REFS_DIRECTORY
> (does not exist yet), GIT_HEAD_FILE (does not exist yet, and I suppose
> it wouldn't be easy to implement it). By the way, that's why I'm for
> .gitlink name for the file, not .git -- this way .gitlink can "shadow"
> what's in .git, for example specifying in a smart way where to search
> (where to find) object database, but HEAD and index would be stored
> together with the checked out directory in .git

What about .git/link or something?
(Obviously without the capability to change GIT_DIR)

> By the way, I'm rather partial to supermodule following HEAD in submodule,
> not specified branch. First, I think it is easier from implementation
> point of view: you don't have to remember which branch supermodule should
> take submodule commits from; and this cannot be fixed branch name like
> 'master'. For example 'maint' branch of supermodule could track 'maint'
> branch of submodule, 'master' branch of supermodule track 'master'
> branch of submodule, 'next' branch of supermodule tranck 'master' (!)
> branch of submodule, 'pu' branch of supermodule track 'next' (!) branch
> of submodule. 

The version tracked by the supermodule is completely independent from
any branches you define in your submodule.
It is of course possible to use different versions of your submodule in
different branches of your supermodule.  But the supermodule does not
know the name of these branches.

In the setup you described a git-checkout in the supermodule would have
to switch to a different branch in the submodule, depending on the
branchname which would have to be stored in the supermodule.
This a lot more complex.

Your scenario can also be solved in this way:

	cd supermodule
	(cd sub && git-reset --hard origin/master)
	git add sub && git commit -m "track master of sub"
	git checkout next
	(cd sub && git-reset --hard origin/master)
	git add sub && git commit -m "track master of sub"
	git checkout pu
	(cd sub && git-reset --hard origin/next)
	git add sub && git commit -m "track next of sub"
	git checkout maint
	(cd sub && git-reset --hard origin/maint)
	git add sub && git commit -m "track maint of sub"

You only store a link to the commit of the current submodule version,
just like a normal ref.  The reference stored in the supermodule really
is equivalent to a normal ref, just that it is stored and updated
slightly different to a normal one.

So whenever you checkout a different version of the supermodule, the
submodule ref automatically gets the correct version.  In the example
above, when you checkout supermodules pu, your submodules branch will be
reset to its origin/next (to be more precise: to the commit which was at
the tip of origin/next at the time it was stored in the supermodule).

The fact that the reference to the current submodule commit does not
only exist in the supermodule tree but also as a physical ref in the
submodule is very similiar to normal files: you have one version stored
in the object database, one in the index and one as a real file in the
working directory (and this working file is the equivalent of the
submodule ref which is stored in submodule/.git/refs/whatever)

The reference in the submodule is just a way to be able to work on
the submodule.  Because well, refs are the kind of thing that is changed
by a commit.  And these submodule commits are exactly the kind of work
you want to store in the supermodule.  So the equivalent to a working
file is not the HEAD of the submodule, but the ref which gets all
changes which are intended for the supermodule.

The fact that the submodule repository still supports other branches has
nothing to do with submodule support.  These branches are totally
independent from the supermodule.

> Second, if you want to do some independent work on the module not related
> to work on submodule you should really clone (clone -l -s) submodule
> and work in separate checkout;

Yes.
But I really like the possibility to switch one module to a branch which
is not tracked by the parent, because it perhaps contains some debugging
code which is needed to debug some other submodule.  You can't move it
out because you need the common build infrastructure but you don't want
to branch the entire toplevel project because you don't want your
debugging changes to ever become visible at that level.

So by switching to a different branch you can effectivly say: this is
temporary, not meant for the superproject.
If you change your mind later you can always merge the submodule branch
back to master.

> the complaint that with tracking HEAD you can check-in wrong version
> of submodule to supermodule commit doesn't hold, because you still
> would have problem that _tree_ of supermodule would have wrong version
> of submodule.

Sorry, I don't understand you here.

> And moving to using single defined branch of submodule brings
> multitude of other problems: for example you might usually track
> 'master' version of submodule, but for a short time need to track
> 'next' branch because it has functionality you need; and another time
> you need to move to 'maint' branch or even your own branch because
> 'master' version breaks something in supermodule.

That is no problem.
The supermodule can track whatever _version_ it wants.  You can set
it to any version which is available in the repository, including all
those well known external branches.
But the supermodule itself does not know (and should not know) about
"maint" / "next" / whatever branch names in the submodule.

> Hmmm... I wonder how planned allowing to checking out tags, non-head
> branches (e.g. tracking/remote branches) and arbitrary commits but
> forbidding committing when HEAD is not a refs/heads/ branch would
> affect submodules / subprojects...

It only affects submodules if you really track HEAD directly.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 13:48             ` Martin Waitz
@ 2006-12-17 14:29               ` Jakub Narebski
  2006-12-17 19:54                 ` Martin Waitz
  2006-12-17 23:23               ` Josef Weidendorfer
  1 sibling, 1 reply; 24+ messages in thread
From: Jakub Narebski @ 2006-12-17 14:29 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, git, Junio C Hamano

Martin Waitz wrote:
> On Sun, Dec 17, 2006 at 02:01:09PM +0100, Jakub Narebski wrote:

>> Well, in the .gitlink proposal you could specify GIT_DIR for checkout,
>> or separately: GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE, GIT_REFS_DIRECTORY
>> (does not exist yet), GIT_HEAD_FILE (does not exist yet, and I suppose
>> it wouldn't be easy to implement it). By the way, that's why I'm for
>> .gitlink name for the file, not .git -- this way .gitlink can "shadow"
>> what's in .git, for example specifying in a smart way where to search
>> (where to find) object database, but HEAD and index would be stored
>> together with the checked out directory in .git
> 
> What about .git/link or something?
> (Obviously without the capability to change GIT_DIR)

Well, the .gitlink proposal at it is now (by Josef) serves both as a way
to implement lightweight checkout (i.e. having additional working dir to
some repository, or having working dir separate from bare repository),
and as a way to have "smart" submodules (which you can move and rename)
in submodules/subproject support.

Besides, I'd rather either use config file for this (core.link or
core.git_dir), or use .git/GIT_DIR.
 
>> By the way, I'm rather partial to supermodule following HEAD in submodule,
>> not specified branch. First, I think it is easier from implementation
>> point of view: you don't have to remember which branch supermodule should
>> take submodule commits from; and this cannot be fixed branch name like
>> 'master'. 
[...]
> In the setup you described a git-checkout in the supermodule would have
> to switch to a different branch in the submodule, depending on the
> branchname which would have to be stored in the supermodule.
> This a lot more complex.

O.K. Now I understand why you prefer specified branch to HEAD.
I have forgot that checkout must update submodule ref, and if we track HEAD
we would have to remember the branch it pointed to.

By the way, should this ref be in submodule, or in supermodule, e.g. in
refs/modules/<name>/HEAD? And there is a problam _what_ branch should
be that.

Both approaches have advantages and disadvantages...
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 14:29               ` Jakub Narebski
@ 2006-12-17 19:54                 ` Martin Waitz
  2006-12-17 23:27                   ` Josef Weidendorfer
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Waitz @ 2006-12-17 19:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Josef Weidendorfer, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

hoi :)

On Sun, Dec 17, 2006 at 03:29:02PM +0100, Jakub Narebski wrote:
> By the way, should this ref be in submodule, or in supermodule, e.g. in
> refs/modules/<name>/HEAD? And there is a problam _what_ branch should
> be that.

At the moment I simply use refs/heads/master of the submodule
repository, just because it is the default branch anyway.

In order to make the submodule refs which are not added to the
supermodule available to the supermodule anyway (for fsck and prune),
I added a symlink .git/refs/module/<submodule> -> <submodule>/.git/refs,
so that the submodule branch is also available as
refs/module/<submodule>/heads/master in the supermodule.

But I expect that all this setup stuff can be greatly simplified with a
little bit more knowledge of submodules in the core.  But this cleanup
is for later, when the basis is settled.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 13:48             ` Martin Waitz
  2006-12-17 14:29               ` Jakub Narebski
@ 2006-12-17 23:23               ` Josef Weidendorfer
  2006-12-18  7:44                 ` Martin Waitz
  1 sibling, 1 reply; 24+ messages in thread
From: Josef Weidendorfer @ 2006-12-17 23:23 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Jakub Narebski, git, Junio C Hamano

On Sunday 17 December 2006 14:48, Martin Waitz wrote:
> The version tracked by the supermodule is completely independent from
> any branches you define in your submodule.
> It is of course possible to use different versions of your submodule in
> different branches of your supermodule.  But the supermodule does not
> know the name of these branches.

I see that you always use "refs/heads/master" in the submodule.
What happens if you do development in the submodule, create a new commit
there, and want to switch supermodule branch afterwards?
Wouldn't you lose your new work, as "refs/heads/master" has to be reset
to another commit when you switch the supermodule branch?

IMHO it would be nice to have refs in the submodule matching all the
branches/tags of the supermodule.
Meaning: "this is the commit which is used by branch/tag XYZ in the
supermodule". This can be valuable information, and a "gitk --all" in
the submodule would show you all the uses of your subproject in the
scope of the given superproject.
We could occupy the local refs namespace of the
submodule with the same refs as there are in the supermodule. But that
is no problem as the original branches of the subproject would be
in "refs/remotes/".

When switching branches in the supermodule, it simply would switch
to the same name in submodules. The submodule refs would not need
to match the submodule object in the tree of the supermodule; instead,
it would represent the development done in the submodule while on a
given branch in the supermodule. Thus, this would allow to do bug fix commits
for a submodule at all places where the supermodule has a branch, without
the need to switch supermodule branches.
However, "git commit" in branch X in the supermodule should give a warning
when submodules are not all at the same branch X, as the commit would use
branch X for committing.

> > Second, if you want to do some independent work on the module not related
> > to work on submodule you should really clone (clone -l -s) submodule
> > and work in separate checkout;
> 
> Yes.
> But I really like the possibility to switch one module to a branch which
> is not tracked by the parent, because it perhaps contains some debugging
> code which is needed to debug some other submodule.  You can't move it
> out because you need the common build infrastructure but you don't want
> to branch the entire toplevel project because you don't want your
> debugging changes to ever become visible at that level.

In general, I agree with not following submodule's HEAD for supermodule
commits. As you cannot store any submodule branch names, this really
would be confusing, as after switching to another supermodule branch
and back again, the submodule branch name would reset to a given name
("master" in your current implementation).

But why wouldn't you create a temporary branch "debug_submodule1" in the
supermodule for your use case? Branches are cheap with git, even in supermodules.
Supermodule branches also are pure local, you never have to publish
it somewhere, and can delete it afterwards.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 19:54                 ` Martin Waitz
@ 2006-12-17 23:27                   ` Josef Weidendorfer
  2006-12-18  7:45                     ` Martin Waitz
  0 siblings, 1 reply; 24+ messages in thread
From: Josef Weidendorfer @ 2006-12-17 23:27 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Jakub Narebski, git, Junio C Hamano

On Sunday 17 December 2006 20:54, Martin Waitz wrote:
> I added a symlink .git/refs/module/<submodule> -> <submodule>/.git/refs,
> so that the submodule branch is also available as
> refs/module/<submodule>/heads/master in the supermodule.

Ah.
What is "<submodule>" in your implementation?
Is this some encoding of the path where the submodule currently lives
in the supermodule, or are you giving the submodules unique names
in the context of the supermodule?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 23:23               ` Josef Weidendorfer
@ 2006-12-18  7:44                 ` Martin Waitz
  2006-12-18 10:30                   ` Josef Weidendorfer
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Waitz @ 2006-12-18  7:44 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Jakub Narebski, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3475 bytes --]

hoi :)

On Mon, Dec 18, 2006 at 12:23:45AM +0100, Josef Weidendorfer wrote:
> I see that you always use "refs/heads/master" in the submodule.
> What happens if you do development in the submodule, create a new commit
> there, and want to switch supermodule branch afterwards?
> Wouldn't you lose your new work, as "refs/heads/master" has to be reset
> to another commit when you switch the supermodule branch?

It should behave the same as for files:
Refuse to update the working directory if files (or the submodule here)
are dirty.  I guess this is not yet handled correctly by my prototype,
but it should not be hard to do.

> IMHO it would be nice to have refs in the submodule matching all the
> branches/tags of the supermodule.
> Meaning: "this is the commit which is used by branch/tag XYZ in the
> supermodule". This can be valuable information, and a "gitk --all" in
> the submodule would show you all the uses of your subproject in the
> scope of the given superproject.

I like the idea.  Perhaps make them available similiar to the remotes
information in refs/tracked/{heads,tags} or something.

> When switching branches in the supermodule, it simply would switch
> to the same name in submodules.

Nice idea, but I don't yet know how it really works out.
It may be confusing to the user if he manually switches the branch in
the submodule to another branch of the supermodule.  Then he really is
using one tracked branch, but not the currently tracked branch.

> Thus, this would allow to do bug fix commits for a submodule at all
> places where the supermodule has a branch, without the need to switch
> supermodule branches.

Hmm, but when switching to another supermodule branch it would try to
update the submodule branch.
And simply allow the current submodule branch to be a fast forward of
the submodule version that the parent wants to set is a bad, as you
would not be able to go back to an old supermodule version then.

> > > Second, if you want to do some independent work on the module not related
> > > to work on submodule you should really clone (clone -l -s) submodule
> > > and work in separate checkout;
> > 
> > Yes.
> > But I really like the possibility to switch one module to a branch which
> > is not tracked by the parent, because it perhaps contains some debugging
> > code which is needed to debug some other submodule.  You can't move it
> > out because you need the common build infrastructure but you don't want
> > to branch the entire toplevel project because you don't want your
> > debugging changes to ever become visible at that level.
> 
> In general, I agree with not following submodule's HEAD for supermodule
> commits. As you cannot store any submodule branch names, this really
> would be confusing, as after switching to another supermodule branch
> and back again, the submodule branch name would reset to a given name
> ("master" in your current implementation).
> 
> But why wouldn't you create a temporary branch "debug_submodule1" in the
> supermodule for your use case? Branches are cheap with git, even in supermodules.
> Supermodule branches also are pure local, you never have to publish
> it somewhere, and can delete it afterwards.

Sure, you can of course always use supermodule branches.
I just wanted to point out that it still is useful to have submodule
branches which are independent from supermodule branches.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-17 23:27                   ` Josef Weidendorfer
@ 2006-12-18  7:45                     ` Martin Waitz
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Waitz @ 2006-12-18  7:45 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Jakub Narebski, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Mon, Dec 18, 2006 at 12:27:25AM +0100, Josef Weidendorfer wrote:
> On Sunday 17 December 2006 20:54, Martin Waitz wrote:
> > I added a symlink .git/refs/module/<submodule> -> <submodule>/.git/refs,
> > so that the submodule branch is also available as
> > refs/module/<submodule>/heads/master in the supermodule.
> 
> Ah.
> What is "<submodule>" in your implementation?
> Is this some encoding of the path where the submodule currently lives
> in the supermodule, or are you giving the submodules unique names
> in the context of the supermodule?

At the moment, it's just the path inside the parent.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Subprojects tasks
  2006-12-18  7:44                 ` Martin Waitz
@ 2006-12-18 10:30                   ` Josef Weidendorfer
  0 siblings, 0 replies; 24+ messages in thread
From: Josef Weidendorfer @ 2006-12-18 10:30 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Jakub Narebski, git, Junio C Hamano

On Monday 18 December 2006 08:44, Martin Waitz wrote:
> On Mon, Dec 18, 2006 at 12:23:45AM +0100, Josef Weidendorfer wrote:
> > I see that you always use "refs/heads/master" in the submodule.
> > What happens if you do development in the submodule, create a new commit
> > there, and want to switch supermodule branch afterwards?
> > Wouldn't you lose your new work, as "refs/heads/master" has to be reset
> > to another commit when you switch the supermodule branch?
> 
> It should behave the same as for files:
> Refuse to update the working directory if files (or the submodule here)
> are dirty.  I guess this is not yet handled correctly by my prototype,
> but it should not be hard to do.

Ah, I see.
Yes, this is consistent with other dirty files.

> > IMHO it would be nice to have refs in the submodule matching all the
> > branches/tags of the supermodule.
> > Meaning: "this is the commit which is used by branch/tag XYZ in the
> > supermodule". This can be valuable information, and a "gitk --all" in
> > the submodule would show you all the uses of your subproject in the
> > scope of the given superproject.
> 
> I like the idea.  Perhaps make them available similiar to the remotes
> information in refs/tracked/{heads,tags} or something.

Yes.
However, you want to do development on these branches. And
"refs/tracked/..." is read-only. However, taking the whole local
refs namespace is not good, as you perhaps want branches independent
of the supermodule, which could give name conflicts.
What about using "refs/{heads,tags}/supermodule/..."? This could
be a compromise.

> > When switching branches in the supermodule, it simply would switch
> > to the same name in submodules.
> 
> Nice idea, but I don't yet know how it really works out.
> It may be confusing to the user if he manually switches the branch in
> the submodule to another branch of the supermodule.  Then he really is
> using one tracked branch, but not the currently tracked branch.

But you already have the same problem with your current approach, don't
you?

Actually, the most expected thing for the user really would be to use
HEAD in supermodule commits. Every other behavior can get confusing for
the user: (S)He simply expects the state of the checkout to be committed.
Any branch switching in submodules should be temporary.

Actually, you can be on a temporary branch in a submodule and still switch
branches in the supermodule. It is the same as with dirty files: The
modifications can be carried over to other branches and back, as long as
there are no conflicts. 

However, I think it is important to check that you are back on the
right branch when committing. With warning or even error.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-12-18 10:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-16 18:32 Subprojects tasks Junio C Hamano
2006-12-16 18:45 ` Jakub Narebski
2006-12-16 23:01   ` Martin Waitz
2006-12-16 23:15     ` Jakub Narebski
2006-12-17  0:01       ` Josef Weidendorfer
2006-12-17 11:45         ` Martin Waitz
2006-12-17 13:01           ` Jakub Narebski
2006-12-17 13:48             ` Martin Waitz
2006-12-17 14:29               ` Jakub Narebski
2006-12-17 19:54                 ` Martin Waitz
2006-12-17 23:27                   ` Josef Weidendorfer
2006-12-18  7:45                     ` Martin Waitz
2006-12-17 23:23               ` Josef Weidendorfer
2006-12-18  7:44                 ` Martin Waitz
2006-12-18 10:30                   ` Josef Weidendorfer
2006-12-17  0:08       ` Josef Weidendorfer
2006-12-16 20:35 ` Sven Verdoolaege
2006-12-16 21:07   ` Junio C Hamano
2006-12-16 22:58     ` Martin Waitz
2006-12-16 23:14       ` Sven Verdoolaege
2006-12-17  0:32       ` Josef Weidendorfer
2006-12-17  8:48 ` Alan Chandler
2006-12-17 10:01   ` Jakub Narebski
2006-12-17 11:17 ` Martin Waitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).