git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: display dirty submodule working directory in git gui and gitk
@ 2010-01-02 15:33 Jens Lehmann
  2010-01-04  9:44 ` Johannes Schindelin
  0 siblings, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-02 15:33 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Johannes Schindelin, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli

Now that we have much better output when displaying diffs of
submodules in git gui and gitk (many thanks to all involved!),
another usability issue shows up: A dirty working directory of
a submodule isn't visible in git gui or gitk.

So you might think a "submodule update" would be ok - as you
see no changes - just too see it fail because the submodules
working directory is dirty.

Or - even worse - you /think/ you committed your changes in
a submodule while you didn't. That can lead to 'interesting'
problems which can be pretty hard to diagnose (like breaking
builds on other peoples machines).


A possible solution could look like this:

AFAICS, git gui and gitk use "git diff-files" both to get the
file names of unstaged local changes and to later display the
actual differences.

If they could tell the diff core to also check the submodule
working directories and to output an extra line - maybe
something like "Submodule <name> contains uncommitted local
changes" - when a submodules working directory is dirty,
git gui and gitk could show the submodules state adequately.


What do you think about this approach?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-02 15:33 RFC: display dirty submodule working directory in git gui and gitk Jens Lehmann
@ 2010-01-04  9:44 ` Johannes Schindelin
  2010-01-04 10:44   ` Heiko Voigt
                     ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-04  9:44 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli

Hi,

On Sat, 2 Jan 2010, Jens Lehmann wrote:

> Now that we have much better output when displaying diffs of submodules 
> in git gui and gitk (many thanks to all involved!), another usability 
> issue shows up: A dirty working directory of a submodule isn't visible 
> in git gui or gitk.
> 
> So you might think a "submodule update" would be ok - as you see no 
> changes - just too see it fail because the submodules working directory 
> is dirty.
> 
> Or - even worse - you /think/ you committed your changes in a submodule 
> while you didn't. That can lead to 'interesting' problems which can be 
> pretty hard to diagnose (like breaking builds on other peoples 
> machines).
> 
> 
> A possible solution could look like this:
> 
> AFAICS, git gui and gitk use "git diff-files" both to get the file names 
> of unstaged local changes and to later display the actual differences.
> 
> If they could tell the diff core to also check the submodule working 
> directories and to output an extra line - maybe something like 
> "Submodule <name> contains uncommitted local changes" - when a 
> submodules working directory is dirty, git gui and gitk could show the 
> submodules state adequately.

The real problem is that submodules in the current form are not very well 
designed.  For example, a submodule being at a different commit than in 
the superproject's index is not as fatal as the submodule having changes.

So in the long run, IMHO a proper redesign of the submodules would not 
make only a little sense (it does not help, though, that those who 
implemented and furthered the current approach over other discussed 
approaches do not use submodules themselves -- not even now).

In ths short run, we can paper over the shortcomings of the submodules by 
introducing a command line option "--include-submodules" to 
update-refresh, diff-files and diff-index, though.

The implementation might be a bit tricky as parts of Git's source code 
still use the_index, but at least adding the submodule's object database 
is no longer that difficult.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and  gitk
  2010-01-04  9:44 ` Johannes Schindelin
@ 2010-01-04 10:44   ` Heiko Voigt
  2010-01-04 11:46     ` submodules, was " Johannes Schindelin
  2010-01-04 17:04   ` Jens Lehmann
  2010-01-04 17:51   ` RFC: display dirty submodule working directory in git gui and gitk Nguyen Thai Ngoc Duy
  2 siblings, 1 reply; 45+ messages in thread
From: Heiko Voigt @ 2010-01-04 10:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli

Hi,

Johannes wrote:
> The real problem is that submodules in the current form are not very well
> designed.  For example, a submodule being at a different commit than in
> the superproject's index is not as fatal as the submodule having changes.
>
> So in the long run, IMHO a proper redesign of the submodules would not
> make only a little sense (it does not help, though, that those who
> implemented and furthered the current approach over other discussed
> approaches do not use submodules themselves -- not even now).

Do you mean the complete workflow (submodules are links to other git repos)
or the current implementation? Do you have links to other design
approaches/threads? Would be nice if we could take that into account for any
decision.

cheers Heiko

^ permalink raw reply	[flat|nested] 45+ messages in thread

* submodules, was Re: RFC: display dirty submodule working directory in git gui and      gitk
  2010-01-04 10:44   ` Heiko Voigt
@ 2010-01-04 11:46     ` Johannes Schindelin
  2010-01-04 18:29       ` Avery Pennarun
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-04 11:46 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Lars Hjemli

Hi,

On Mon, 4 Jan 2010, Heiko Voigt wrote:

> Johannes wrote:
> > The real problem is that submodules in the current form are not very 
> > well designed.  For example, a submodule being at a different commit 
> > than in the superproject's index is not as fatal as the submodule 
> > having changes.
> >
> > So in the long run, IMHO a proper redesign of the submodules would not 
> > make only a little sense (it does not help, though, that those who 
> > implemented and furthered the current approach over other discussed 
> > approaches do not use submodules themselves -- not even now).
> 
> Do you mean the complete workflow (submodules are links to other git 
> repos) or the current implementation? Do you have links to other design 
> approaches/threads? Would be nice if we could take that into account for 
> any decision.

Unfortunately, I do not have any information about different approaches 
except the approach Subversion takes.  While Subversion's externals are 
not perfect for all applications, for some, they are.  So I consider this 
a serious shortcoming that Git does not support that workflow (and in 
fact, AFAIR Shawn's repo does not use submodules for that exact reason).

But I think that an important precondition to come up with a better design 
of the submodules is to have suffered the current implementation in 
real-world work using submodules. (Which reminds me very much of the 
autocrlf mess.)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04  9:44 ` Johannes Schindelin
  2010-01-04 10:44   ` Heiko Voigt
@ 2010-01-04 17:04   ` Jens Lehmann
  2010-01-04 22:29     ` submodules' shortcomings, was " Johannes Schindelin
  2010-01-04 17:51   ` RFC: display dirty submodule working directory in git gui and gitk Nguyen Thai Ngoc Duy
  2 siblings, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-04 17:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli

Am 04.01.2010 10:44, schrieb Johannes Schindelin:
> The real problem is that submodules in the current form are not very well 
> designed.

IMVHO using the tree sha1 for a submodule seems to be the 'natural' way
to include another git repo. And it gives the reproducibility i expect
from a scm. Or am i missing something?

It looks to me as most shortcomings come from the fact that most git
commands tend to ignore submodules (and if they don't, like git gui and
gitk do now, they e.g. only show certain aspects of their state).

Submodules are in heavy use in our company since last year. Virtually
every patch i submitted for submodules came from that experience and
scratched an itch i or one of my colleagues had (and the situation did
already improve noticeably by the few things we changed). We are still
convinced that using submodules was the right decision. But some work
has still to be done to be able to use them easily and to get rid of
some pitfalls.


> In ths short run, we can paper over the shortcomings of the submodules by 
> introducing a command line option "--include-submodules" to 
> update-refresh, diff-files and diff-index, though.

Maybe this is the way to go for now (and hopefully we can turn this
option on by default later because we did the right thing ;-).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and  gitk
  2010-01-04  9:44 ` Johannes Schindelin
  2010-01-04 10:44   ` Heiko Voigt
  2010-01-04 17:04   ` Jens Lehmann
@ 2010-01-04 17:51   ` Nguyen Thai Ngoc Duy
  2010-01-04 18:40     ` Jens Lehmann
  2 siblings, 1 reply; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-04 17:51 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli

On 1/4/10, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> The real problem is that submodules in the current form are not very well
>  designed.  For example, a submodule being at a different commit than in
>  the superproject's index is not as fatal as the submodule having changes.
>
>  So in the long run, IMHO a proper redesign of the submodules would not
>  make only a little sense (it does not help, though, that those who
>  implemented and furthered the current approach over other discussed
>  approaches do not use submodules themselves -- not even now).
>
>  In ths short run, we can paper over the shortcomings of the submodules by
>  introducing a command line option "--include-submodules" to
>  update-refresh, diff-files and diff-index, though.

Incidentally I was just drafting git-super.sh it see how far it goes.
The goal was to implement some cross-module operations over time. "git
super status", "git super commit" and others could be handy.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules, was Re: RFC: display dirty submodule working  directory in git gui and gitk
  2010-01-04 11:46     ` submodules, was " Johannes Schindelin
@ 2010-01-04 18:29       ` Avery Pennarun
  2010-01-04 19:14         ` Jens Lehmann
  0 siblings, 1 reply; 45+ messages in thread
From: Avery Pennarun @ 2010-01-04 18:29 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Heiko Voigt, Jens Lehmann, Git Mailing List, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Lars Hjemli

On Mon, Jan 4, 2010 at 6:46 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> But I think that an important precondition to come up with a better design
> of the submodules is to have suffered the current implementation in
> real-world work using submodules. (Which reminds me very much of the
> autocrlf mess.)

I suffered the current implementation, which is why I wrote
git-subtree :)  I'm still suffering, though; git-subtree works much
better for my own use cases, but after some experience with it, I'm
still not totally happy.

For me one big problem comes down to producing accurate output for
'git log'.  git submodules assume that the history inside the module
is entirely separate (you need to run multiple 'git log' instances to
see the full history); git-subtree assumes that it's entirely
integrated.  In that sense, git-subtree is somewhat more in line with
the core principle of git (we track the history of "the content", not
any particular file or subdir).  Unfortunately, it also exposes a
problem with that core principle: taken to its extreme, "the content"
includes all data in the universe.  And while git could branch and
merge the universe very efficiently in about O(log n) time, 'git log'
output gets less useful about O(n) with the size of the tree.

Neither git-subtree nor git submodules seem to help with this "log
pollution" problem very much - but I don't know what to do that would
be better.

Outside of this, my major problem with submodules is they use separate
work trees and repositories, and thus require lots of extra
housekeeping to get anything done.  I'd be much happier if submodules
would share the same objects/packs/.gitdir/refs/indexfile as the
superproject, and the *only* thing special about them would be that
the superproject's tree points at a commit object instead of a tree
object.  In other words, I think the actual repo format is correct
as-is, but the tools surrounding it cause a lot of confusion.

Imagine if cloning a superproject also checked out the subproject
transparently, and committing dirty data inside the subproject's tree
created a new commit object for the subproject, then tacked that
commit object into the superproject's index for a later commit
(exactly as changing a subdir creates a new tree object that the
parent directory can refer to).

This doesn't solve some use cases, however, such as ones where people
really don't want to check out (or even fetch) the contents of some
submodules, even when they check out the superproject.  The current
implementation *does* handle that situation.  I'm not sure how many
people rely on that behaviour, though.  (And maybe the correct
solution to *that* is proper support for sparse clone/checkout
regardless of submodules.)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04 17:51   ` RFC: display dirty submodule working directory in git gui and gitk Nguyen Thai Ngoc Duy
@ 2010-01-04 18:40     ` Jens Lehmann
  2010-01-04 19:05       ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-04 18:40 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Johannes Schindelin, Git Mailing List, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli

Am 04.01.2010 18:51, schrieb Nguyen Thai Ngoc Duy:
> Incidentally I was just drafting git-super.sh it see how far it goes.
> The goal was to implement some cross-module operations over time. "git
> super status", "git super commit" and others could be handy.

Hm, i'm not sure if this will really help us. I would rather see "git
status" and friends do the right thing for submodules too. Maybe this
has to be configurable but i think the separate commands that one has
to use for submodules now are part of the usability problems we are
seeing.

IMHO putting the functionality of "git submodule summary" into "git
diff" was a step in the right direction. This thread is about adding a
line to the diff output when diffing against the working directory and
a submodule has a dirty working directory too. Then you can ask "git
diff" and it tells you anything you need to know about the submodule
before committing or checking out in the supermodule (And IMO later on
"git status" should give us this information too).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04 18:40     ` Jens Lehmann
@ 2010-01-04 19:05       ` Junio C Hamano
  2010-01-04 19:21         ` Jens Lehmann
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-01-04 19:05 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Am 04.01.2010 18:51, schrieb Nguyen Thai Ngoc Duy:
>> Incidentally I was just drafting git-super.sh it see how far it goes.
>> The goal was to implement some cross-module operations over time. "git
>> super status", "git super commit" and others could be handy.
>
> Hm, i'm not sure if this will really help us. I would rather see "git
> status" and friends do the right thing for submodules too. Maybe this
> has to be configurable but i think the separate commands that one has
> to use for submodules now are part of the usability problems we are
> seeing.
>
> IMHO putting the functionality of "git submodule summary" into "git
> diff" was a step in the right direction. This thread is about adding a
> line to the diff output when diffing against the working directory and
> a submodule has a dirty working directory too. Then you can ask "git
> diff" and it tells you anything you need to know about the submodule
> before committing or checking out in the supermodule (And IMO later on
> "git status" should give us this information too).

Both will be valid approaches to work toward the same goal.  A separate
prototype implementation can be a way to easily figure out what the
desired features are.

If "git super status" does turns out to be consistent with what "git
status" is supposed to do, you can decide to fold that into the latter at
that point.  On the other hand, information people may want from "git
super status" could be different from what people want "git status" from,
in which case it might be better to either become a new option to "git
status", or become a new subcommand to "git submodule".

You start the prototype by changing "git status" and later decide that the
end result either needs to become an optional behaviour, or maybe even a
separate command.  Either way the end result will be the same---a good
feature to help people is placed at the most logical place.

For the past 12 months, you and Johan Herland were the people who had more
than one patches with substance to git-submodule.sh and I would really
appreciate and at the same time want to encourage experimentation by
people like you who are heavy users with need for a better submodule
support.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules, was Re: RFC: display dirty submodule working  directory in git gui and gitk
  2010-01-04 18:29       ` Avery Pennarun
@ 2010-01-04 19:14         ` Jens Lehmann
  0 siblings, 0 replies; 45+ messages in thread
From: Jens Lehmann @ 2010-01-04 19:14 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Johannes Schindelin, Heiko Voigt, Git Mailing List,
	Junio C Hamano, Shawn O. Pearce, Paul Mackerras, Lars Hjemli

Am 04.01.2010 19:29, schrieb Avery Pennarun:
> For me one big problem comes down to producing accurate output for
> 'git log'.  git submodules assume that the history inside the module
> is entirely separate (you need to run multiple 'git log' instances to
> see the full history); git-subtree assumes that it's entirely
> integrated.  In that sense, git-subtree is somewhat more in line with
> the core principle of git (we track the history of "the content", not
> any particular file or subdir).  Unfortunately, it also exposes a
> problem with that core principle: taken to its extreme, "the content"
> includes all data in the universe.  And while git could branch and
> merge the universe very efficiently in about O(log n) time, 'git log'
> output gets less useful about O(n) with the size of the tree.
> 
> Neither git-subtree nor git submodules seem to help with this "log
> pollution" problem very much - but I don't know what to do that would
> be better.

I think this depends extremely on the use case and may even differ
from submodule to submodule. It might be desirable to be able to
specify which submodule logs you want to see, because only the user
knows what is important for him. But you should be able to ask "git
log" directly without forking it in every submodule you care about,
no?

There has been a thread between Junio and Heiko about group mappings
for submodules. Maybe the configuration could be extended to contain
information about what submodule should add to the superprojects log?
http://thread.gmane.org/gmane.comp.version-control.git/130928/


> Outside of this, my major problem with submodules is they use separate
> work trees and repositories, and thus require lots of extra
> housekeeping to get anything done.  I'd be much happier if submodules
> would share the same objects/packs/.gitdir/refs/indexfile as the
> superproject, and the *only* thing special about them would be that
> the superproject's tree points at a commit object instead of a tree
> object.  In other words, I think the actual repo format is correct
> as-is, but the tools surrounding it cause a lot of confusion.

I don't care deeply where the objects live but agree about the repo
format and the confusion ;-)


> Imagine if cloning a superproject also checked out the subproject
> transparently,

That would be great (at least at checkout time, after clone you
might wanna decide which submodules to initialize first - unless
group mappings are working). Right now we use post-checkout hooks
to do that.


> and committing dirty data inside the subproject's tree
> created a new commit object for the subproject, then tacked that
> commit object into the superproject's index for a later commit
> (exactly as changing a subdir creates a new tree object that the
> parent directory can refer to).

That would be a nice feature.


> This doesn't solve some use cases, however, such as ones where people
> really don't want to check out (or even fetch) the contents of some
> submodules, even when they check out the superproject.  The current
> implementation *does* handle that situation.  I'm not sure how many
> people rely on that behaviour, though.  (And maybe the correct
> solution to *that* is proper support for sparse clone/checkout
> regardless of submodules.)

We do rely on this behavior. But sparse clone or group mappings
could replace that need.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04 19:05       ` Junio C Hamano
@ 2010-01-04 19:21         ` Jens Lehmann
  0 siblings, 0 replies; 45+ messages in thread
From: Jens Lehmann @ 2010-01-04 19:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli

Am 04.01.2010 20:05, schrieb Junio C Hamano:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
>> Am 04.01.2010 18:51, schrieb Nguyen Thai Ngoc Duy:
>>> Incidentally I was just drafting git-super.sh it see how far it goes.
>>> The goal was to implement some cross-module operations over time. "git
>>> super status", "git super commit" and others could be handy.
>>
>> Hm, i'm not sure if this will really help us. I would rather see "git
>> status" and friends do the right thing for submodules too. Maybe this
>> has to be configurable but i think the separate commands that one has
>> to use for submodules now are part of the usability problems we are
>> seeing.

> Both will be valid approaches to work toward the same goal.  A separate
> prototype implementation can be a way to easily figure out what the
> desired features are.

> For the past 12 months, you and Johan Herland were the people who had more
> than one patches with substance to git-submodule.sh and I would really
> appreciate and at the same time want to encourage experimentation by
> people like you who are heavy users with need for a better submodule
> support.

Right. It was not my intention to discourage such experimentations with
my reply. I'm sorry if my email made this impression.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04 22:29     ` submodules' shortcomings, was " Johannes Schindelin
@ 2010-01-04 22:27       ` Shawn O. Pearce
  2010-01-04 22:35         ` Avery Pennarun
  2010-01-04 22:53       ` Avery Pennarun
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Shawn O. Pearce @ 2010-01-04 22:27 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Paul Mackerras,
	Heiko Voigt, Lars Hjemli

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Besides, as long as there is enough reason to have out-of-Git alternative 
> solutions such as repo, submodules deserve to be 2nd-class citizens.

If I didn't think I'd be shot by current submodule users, I'd offer
to write a full replacement based around the current in repository
format, but with sane features like we have in repo.

Actually, that's why repo happened.  I felt like submodules was
already too frozen to accept a different approach.  And another
guy here thought XML might be a solution to a problem...  :-|

-- 
Shawn.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04 17:04   ` Jens Lehmann
@ 2010-01-04 22:29     ` Johannes Schindelin
  2010-01-04 22:27       ` Shawn O. Pearce
                         ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-04 22:29 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli

Hi,

On Mon, 4 Jan 2010, Jens Lehmann wrote:

> Am 04.01.2010 10:44, schrieb Johannes Schindelin:
> > The real problem is that submodules in the current form are not very 
> > well designed.
> 
> IMVHO using the tree sha1 for a submodule seems to be the 'natural' way 
> to include another git repo. And it gives the reproducibility i expect 
> from a scm. Or am i missing something?

You do remember the discussion at the Alles wird Git about the need for 
Subversion external-like behavior, right?

> It looks to me as most shortcomings come from the fact that most git 
> commands tend to ignore submodules (and if they don't, like git gui and 
> gitk do now, they e.g. only show certain aspects of their state).

It is not only ignoring.  It is not being able to cope with the state only 
submodules can be in (see below).

> Submodules are in heavy use in our company since last year. Virtually 
> every patch i submitted for submodules came from that experience and 
> scratched an itch i or one of my colleagues had (and the situation did 
> already improve noticeably by the few things we changed). We are still 
> convinced that using submodules was the right decision. But some work 
> has still to be done to be able to use them easily and to get rid of 
> some pitfalls.

Submodules may be the best way you have in Git for your workflow ATM.  
But that does not mean that the submodule design is in any way 
thought-through.

Just a few shortcomings that do show up in my main project (and to a 
small extent in msysGit, as you are probably aware):

- submodules were designed with a strong emphasis on not being forced to 
  check them out.  But Git makes it very unconvenient to actually check 
  submodules out, let alone check them out at clone-time.  And it is 
  outright impossible to _enforce_ a submodule to be checked out.

- among other use cases, submodules are recommended for sharing content 
  between two different repositories. But it is part of the design that it 
  is _very_ easy to forget to commit, or push the changes in the submodule 
  that are required for the integrity of the superproject.

- that use case -- sharing content between different repositories -- is 
  not really supported by submodules, but rather an afterthought.  This is 
  all too obvious when you look at the restriction that the shared content 
  must be in a single subdirectory.

- submodules would be a perfect way to provide a fast-forward-only media 
  subdirectory that is written to by different people (artists) than to 
  the superproject (developers).  But there is no mechanism to enforce 
  shallow fetches, which means that this use case cannot be handled 
  efficiently using Git.

- related are the use cases where it is desired not to have a fixed 
  submodule tip committed to the superproject, but always to update to the 
  current, say, master (like Subversion's externals).  This use case has 
  been wished away by the people who implemented submodules in Git.  But 
  reality has this nasty habit of ignoring your wishes, does it not?

- there have been patches supporting rebasing submodules, i.e.  
  submodules where a "git submodule update" rebases the current branch to 
  the revision committed to the superproject rather than detaching the 
  HEAD, which everybody who ever contributed to a project with submodules 
  should agree is a useful thing. But the patches only have been discussed 
  to death, to the point where the discussion's information content was 
  converging to zero, yet the patches did not make it into Git.  (FWIW 
  this is one reason why I refuse to write patches to git-submodule.sh: I 
  refuse to let my time to be wasted like that.)

- working directories with GIT_DIRs are a very different beast from single 
  files.  That alone leads to a _lot_ of problems.  The original design of 
  Git had only a couple of states for named content (AKA files): clean, 
  added, removed, modified.  The states that are possible with submodules 
  are for the most part not handled _at all_ by most Git commands (and it 
  is sometimes very hard to decide what would be the best way to handle 
  those states, either).  Just think of a submodule at a different 
  revision than committed in the superproject, with uncommitted changes, 
  ignored and unignored files, a few custom hooks, a bit of additional 
  metadata in the .git/config, and just for fun, a few temporary files in 
  .git/ which are used by the hooks.

- while it might be called clever that the submodules' metadata are stored 
  in .gitmodules in the superproject (and are therefore naturally tracked 
  with Git), the synchronization with .git/config is performed exactly 
  once -- when you initialize the submodule.  You are likely to miss out 
  on _every_ change you pulled into the superproject.

All in all, submodules are very clumsy to work with, and you are literally 
forced to provide scripts in the superproject to actually work with the 
submodules.

> > In ths short run, we can paper over the shortcomings of the submodules 
> > by introducing a command line option "--include-submodules" to 
> > update-refresh, diff-files and diff-index, though.
> 
> Maybe this is the way to go for now (and hopefully we can turn this 
> option on by default later because we did the right thing ;-).

I do not think that --include-submodules is a good default.  It is just 
too expensive in terms of I/O even to check the status in a superproject 
with a lot of submodules.

Besides, as long as there is enough reason to have out-of-Git alternative 
solutions such as repo, submodules deserve to be 2nd-class citizens.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule  working directory in git gui and gitk
  2010-01-04 22:27       ` Shawn O. Pearce
@ 2010-01-04 22:35         ` Avery Pennarun
  0 siblings, 0 replies; 45+ messages in thread
From: Avery Pennarun @ 2010-01-04 22:35 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Johannes Schindelin, Jens Lehmann, Git Mailing List,
	Junio C Hamano, Paul Mackerras, Heiko Voigt, Lars Hjemli

On Mon, Jan 4, 2010 at 5:27 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> Besides, as long as there is enough reason to have out-of-Git alternative
>> solutions such as repo, submodules deserve to be 2nd-class citizens.
>
> If I didn't think I'd be shot by current submodule users, I'd offer
> to write a full replacement based around the current in repository
> format, but with sane features like we have in repo.

Perhaps write it and call it 'git sub' or something.  Put them both
in, and let users decide which they want to use.  Or, like git
subtree, maintain it separately.

Personally, I've avoided tools like repo because they seem to try to
kidnap my *entire* git experience, most of which is already fine.
It's just submodules that are crazy.  I think it's probably similar
for other people.

Avery

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule  working directory in git gui and gitk
  2010-01-04 22:29     ` submodules' shortcomings, was " Johannes Schindelin
  2010-01-04 22:27       ` Shawn O. Pearce
@ 2010-01-04 22:53       ` Avery Pennarun
  2010-01-05  8:11       ` Jens Lehmann
  2010-01-05 20:38       ` Pau Garcia i Quiles
  3 siblings, 0 replies; 45+ messages in thread
From: Avery Pennarun @ 2010-01-04 22:53 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli

On Mon, Jan 4, 2010 at 5:29 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Mon, 4 Jan 2010, Jens Lehmann wrote:
>> IMVHO using the tree sha1 for a submodule seems to be the 'natural' way
>> to include another git repo. And it gives the reproducibility i expect
>> from a scm. Or am i missing something?
>
> You do remember the discussion at the Alles wird Git about the need for
> Subversion external-like behavior, right?

I'm not sure why this is such an issue.  Basically, non-version-locked
submodules are about the easiest thing in the world; that's why CVS
and SVN supported them first.  (SVN later added version-locking like
git has.)

All you need is a .gitignore entry and a trivial script that checks
out the external.  If you want to be fancy, this operation could be
part of git, but it's such a totally different case (and an easy one,
no less) that I think it ought to be treated totally seperately.

> - among other use cases, submodules are recommended for sharing content
>  between two different repositories. But it is part of the design that it
>  is _very_ easy to forget to commit, or push the changes in the submodule
>  that are required for the integrity of the superproject.
[...]
> - working directories with GIT_DIRs are a very different beast from single
>  files.  That alone leads to a _lot_ of problems.  The original design of
>  Git had only a couple of states for named content (AKA files): clean,
>  added, removed, modified.  The states that are possible with submodules
>  are for the most part not handled _at all_ by most Git commands (and it
>  is sometimes very hard to decide what would be the best way to handle
>  those states, either).  Just think of a submodule at a different
>  revision than committed in the superproject, with uncommitted changes,
>  ignored and unignored files, a few custom hooks, a bit of additional
>  metadata in the .git/config, and just for fun, a few temporary files in
>  .git/ which are used by the hooks.


I think this is primarily because checked-out submodules currently
have their own .git directories (with their own config, index, etc).
If they were considered *part* of the subproject's repo checkout, and
updated upon switching branches, etc, this whole class of problems
would go away.

> - that use case -- sharing content between different repositories -- is
>  not really supported by submodules, but rather an afterthought.  This is
>  all too obvious when you look at the restriction that the shared content
>  must be in a single subdirectory.

I haven't found the subdir requirement to be much of an issue, at
least on Unix where I can simply work around it using symlinks from
the superproject into the subproject.  It's obviously more gross on
Windows, but I've worked around it there too.  This one isn't a daily
aggravation for me, though maybe it is for others.  And any cure I can
think of sounds rather worse than the disease.

> - submodules would be a perfect way to provide a fast-forward-only media
>  subdirectory that is written to by different people (artists) than to
>  the superproject (developers).  But there is no mechanism to enforce
>  shallow fetches, which means that this use case cannot be handled
>  efficiently using Git.

I doubt you want to "enforce" shallow fetches.  And if you just want
to "allow" shallow fetches, or default to shallow fetches, I'd think
it would be pretty easy to add.  This hasn't been important to me
either.  (It seems to be not too important to git users in general, or
git's support *in general* for shallow repositories would be more
featureful.)

> - while it might be called clever that the submodules' metadata are stored
>  in .gitmodules in the superproject (and are therefore naturally tracked
>  with Git), the synchronization with .git/config is performed exactly
>  once -- when you initialize the submodule.  You are likely to miss out
>  on _every_ change you pulled into the superproject.

This could be fixed too, though I gave up on git-submodule before I
bothered to fix it myself.

The correct solution here is simply to not ever copy the settings from
.gitmodules into .git/config.  Instead, git-submodule should read
.gitmodules as defaults, and then override those defaults with
anything in .git/config.  99% of users will probably not need to ever
put any of their settings in .git/config, and so this problem
disappears.

> All in all, submodules are very clumsy to work with, and you are literally
> forced to provide scripts in the superproject to actually work with the
> submodules.

Agreed; I do this in every project which uses git-submodule.  (And
from doing so, I learned that the value-added of git-submodule is
nearly zero.  My script does most of the work, and it could just as
easily check out the submodule as a git repo too.  I could even choose
to version-lock or not version-lock the checked-out submodule: just
hardcode the commitid into my script!)

> I do not think that --include-submodules is a good default.  It is just
> too expensive in terms of I/O even to check the status in a superproject
> with a lot of submodules.

I've thought about this a lot, and I think having a special case for
submodules here is the wrong line of thinking.  A big project
*without* submodules has this same problem.  The "real" solution is to
just make status checks faster.

(This is actually possible to do: in the extreme case, you just have a
daemon running with inotify or the Windows equivalent.  TortoiseSvn
reputedly does something like this.  I've thought of writing such a
daemon myself to just twiddle --assume-{un,}changed flags at the right
times, particularly since status checks in Windows are so ridiculously
slow.  But I got frustrated when it was *still* slow even after
setting --assume-unchanged on all the files in the index.  git still
scans directories to detect *unknown* files, and there seems to be no
way to turn it off or, moreover, to provide the list of unknown files
from some other source.)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-04 22:29     ` submodules' shortcomings, was " Johannes Schindelin
  2010-01-04 22:27       ` Shawn O. Pearce
  2010-01-04 22:53       ` Avery Pennarun
@ 2010-01-05  8:11       ` Jens Lehmann
  2010-01-05  9:33         ` Junio C Hamano
  2010-01-05  9:46         ` Johannes Schindelin
  2010-01-05 20:38       ` Pau Garcia i Quiles
  3 siblings, 2 replies; 45+ messages in thread
From: Jens Lehmann @ 2010-01-05  8:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli, Avery Pennarun

Am 04.01.2010 23:29, schrieb Johannes Schindelin:
> You do remember the discussion at the Alles wird Git about the need for 
> Subversion external-like behavior, right?

Yup. But never having used svn, let alone externals, i think i just
did not get it then ;-)


> - submodules were designed with a strong emphasis on not being forced to 
>   check them out.  But Git makes it very unconvenient to actually check 
>   submodules out, let alone check them out at clone-time.  And it is 
>   outright impossible to _enforce_ a submodule to be checked out.

Absolutely. But i think the group mappings discussed by Junio and Heiko
are a good starting point to solve that problem:
http://thread.gmane.org/gmane.comp.version-control.git/130928/

This should be solvable by putting the necessary information into
.gitmodules and have git clone use it.


> - among other use cases, submodules are recommended for sharing content 
>   between two different repositories. But it is part of the design that it 
>   is _very_ easy to forget to commit, or push the changes in the submodule 
>   that are required for the integrity of the superproject.

Definitely (and if i got that right, svn externals have the same problem).

What about checking for every submodule before a push in the superproject
that its HEAD is on a remote branch? I don't think we can provide full
safety here, but we could handle the 99% case of a forgotten push in the
submodule. This could even be done with a rather simple hook (if we had a
pre-push hook that is :-).


> - that use case -- sharing content between different repositories -- is 
>   not really supported by submodules, but rather an afterthought.  This is 
>   all too obvious when you look at the restriction that the shared content 
>   must be in a single subdirectory.

I don't see that as a problem (and it's the same with svn externals, no?).

And having worked for a long time with a RCS variant which allowed
"projects" to contain an arbitrary list of files, i don't think this is
a problem (but forgetting to add new files to this list really is, so
putting everything in one directory is *much* safer IMHO).
And: almost all files were properly grouped in directories after a decade
of development even though that was not enforced by the scm at all.


> - related are the use cases where it is desired not to have a fixed 
>   submodule tip committed to the superproject, but always to update to the 
>   current, say, master (like Subversion's externals).  This use case has 
>   been wished away by the people who implemented submodules in Git.  But 
>   reality has this nasty habit of ignoring your wishes, does it not?

Having read up about svn externals in the meantime, what about something
like this:
- Add a command like "git submodule forward" (as update is already in
  use) that takes an optional -b <branchname>. It does a fetch in the
  submodule, then tries to fast forward (or rebase) to master or the
  branch given and stages this commit in the superproject. This should
  be the equivalent to doing an "svn update" in a repo with externals.
  Or am i missing something?
  (And we could avoid the detached HEAD in the fast forward case by
  really checking out the branch in the submodule)
- We could also add an option to "git submodule add" to specify the
  default branch name for forward.


> - while it might be called clever that the submodules' metadata are stored 
>   in .gitmodules in the superproject (and are therefore naturally tracked 
>   with Git), the synchronization with .git/config is performed exactly 
>   once -- when you initialize the submodule.  You are likely to miss out 
>   on _every_ change you pulled into the superproject.

Yes. This synchronization could be either obsoleted by only using
.gitmodules or automated.


> Besides, as long as there is enough reason to have out-of-Git alternative 
> solutions such as repo, submodules deserve to be 2nd-class citizens.

I think in the long run to make submodules first class citizens the
following submodule commands must be obsoleted by their regular git
parts: init (by git clone), status (by git status), update (by git
checkout), summary (already in git diff thanks to your patch) and sync
(maybe Avery's idea of only relying on .gitmodules and not copying data
int .git/config would solve this).
That would leave git submodule add, foreach and maybe a command to do
what svn update does for externals and another to manipulate things like
group membership etc..


Which reminds me of Sverre's quote from Alles Wird Git:
"Yes, it is possible. But it will be hard."

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05  8:11       ` Jens Lehmann
@ 2010-01-05  9:33         ` Junio C Hamano
  2010-01-05 10:07           ` Johannes Schindelin
  2010-01-05 11:57           ` Jens Lehmann
  2010-01-05  9:46         ` Johannes Schindelin
  1 sibling, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-01-05  9:33 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Am 04.01.2010 23:29, schrieb Johannes Schindelin:
> ...
>> - submodules were designed with a strong emphasis on not being forced to 
>>   check them out.  But Git makes it very unconvenient to actually check 
>>   submodules out, let alone check them out at clone-time.  And it is 
>>   outright impossible to _enforce_ a submodule to be checked out.
>
> Absolutely. But i think the group mappings discussed by Junio and Heiko
> are a good starting point to solve that problem:
> http://thread.gmane.org/gmane.comp.version-control.git/130928/
>
> This should be solvable by putting the necessary information into
> .gitmodules and have git clone use it.

I sense there is a chicken and egg problem, but I'll let it pass for now.

>> - among other use cases, submodules are recommended for sharing content 
>>   between two different repositories. But it is part of the design that it 
>>   is _very_ easy to forget to commit, or push the changes in the submodule 
>>   that are required for the integrity of the superproject.
>
> Definitely (and if i got that right, svn externals have the same problem).
>
> What about checking for every submodule before a push in the superproject
> that its HEAD is on a remote branch? I don't think we can provide full
> safety here, but we could handle the 99% case of a forgotten push in the
> submodule. This could even be done with a rather simple hook (if we had a
> pre-push hook that is :-).

You don't need "pre-push" hook, if the eventual goal is to integrate this
into "git push" proper; it can notice submodule directories, descending
into them, check if the remote lacks the necessary commit and invoke "git
push" via run_command() interface as needed.

>> - related are the use cases where it is desired not to have a fixed 
>>   submodule tip committed to the superproject, but always to update to the 
>>   current, say, master (like Subversion's externals).  This use case has 
>>   been wished away by the people who implemented submodules in Git.  But 
>>   reality has this nasty habit of ignoring your wishes, does it not?
>
> Having read up about svn externals in the meantime, what about something
> like this:
> - Add a command like "git submodule forward" (as update is already in
>   use) that takes an optional -b <branchname>. It does a fetch in the
>   submodule, then tries to fast forward (or rebase) to master or the
>   branch given and stages this commit in the superproject. This should
>   be the equivalent to doing an "svn update" in a repo with externals.
>   Or am i missing something?
>   (And we could avoid the detached HEAD in the fast forward case by
>   really checking out the branch in the submodule)
> - We could also add an option to "git submodule add" to specify the
>   default branch name for forward.

Instead of recording a specific submodule commit in the superproject, we
could record a branch name (this would need a separate "gitlink" type of
object we toyed around during the early days of submodule design) to say
"the tip of the branch".

But there is a difference between a distributed system and a centralized
one like Subversion.  When you say "tip of the branch", you have to say
"which repository".  If your position is that _any_ repository will do as
long as the commit is at the tip of the named branch, that is like saying
you don't care what commit it really is, as you are free to muck with
branch heads in your copy of submodule repository, by adding commits, or
resetting new ones away.  For that matter, your 'master' branch in the
submodule repository may not build-on/fork-from the 'master' branch in the
upstream of it, so even "tip of the branch by _this name_" is still fuzzy.

I am not saying "any commit will do" is necessarily a bad position to
take.  But people who claim they want to say "this branch" need to realize
what they are really saying: whatever you record in the superproject
commit is immaterial.  In other words, "this superproject will work no
matter which version of submodule is checked out at its location".

Thatv actually is a very valid thing to say in some situations (Dscho
mentioned different versions of artwork checked out as a submodule in a
developer's superproject to build an app).  Interestingly enough, some
people seem to think that we place too much importance on not having to
check out submodules, but it indeed is a very natural extention of "any
commit will do".  If the configuration you chose for your build does not
depend on any files from there, it will truly be "any commit will do",
including "nothing checked out there is just fine".

So it is not necessarily a bad thing if the commit checked out in the
submodule repository is different from what the superproject records in
its index when a commit is made in the superproject.  We allow committing
with local changes in regular files, while we do notify the users about
them to avoid mistakes.  We should give the same kind of notification
about submodules, but the "local changes" need to be thought out more
carefully than plain files in the superproject itself.  Does uncommitted
changes in the index of submodule repository count?  Local changes in the
work tree files?  What about untracked files that the user might have
forgot to add?  Should they be warned?  What about the commit in the
submodule repository being a non-descendant of the commit recorded in the
HEAD of the superproject's tree, resulting in a non-ff change at the
submodule level?

What this also means is that it is important to

 (1) be able to simply be a user of the submodule (in such a scenario, the
     developer who uses artwork from designer's repository does _not_ want
     to commit the submodule, but he does want to have a recent checkout
     of it, and he might even make some tweaks); and

 (2) being able to commit the state of the superproject, even if there is
     a mismatch between the submodule commit recorded in the superproject
     and the actual version that is checked into the authoritative
     submodule project by the designer (perhaps he hasn't pulled in the
     submodule while traveling).

In other words, even if the default is made to "always clone and checkout
all the submodules, and before allowing anything be done in the higher
levels of superprojects, submodules must be made in sync with their
latest", there has to be a way to override such a rigid constraints for
the resulting system to be usable.

> I think in the long run to make submodules first class citizens the
> following submodule commands must be obsoleted by their regular git
> parts: init (by git clone), status (by git status), update (by git
> checkout), summary (already in git diff thanks to your patch) and sync
> (maybe Avery's idea of only relying on .gitmodules and not copying data
> int .git/config would solve this).

I think "clone" has a chicken-and-egg problem.  If all of your project
participant are expected to check out all the submodules, are expected to
make commits in all of them, and essentially have to track everything in
sync, then "clone" can obviously do that without asking what kind of
participant you are [*1*].  Otherwise, you need to have some mechanism
(e.g. "group mapping" you mentioned earlier) for the user to specify "I am
interested in these submodules" before the actual sub-clones to happen,
but until you clone the superproject that has some description for that
mechanism to use, and the user to see what's available, you cannot say
what kind of participant you are.  It has to become two-step process;
either "clone" going interactive in the middle, or you let the clone to
happen and then "submodule init" to express that information.


[Footnote]

*1* of course, in such a scenario you have to question what you are using
submodules for.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05  8:11       ` Jens Lehmann
  2010-01-05  9:33         ` Junio C Hamano
@ 2010-01-05  9:46         ` Johannes Schindelin
  2010-01-05 12:19           ` Jens Lehmann
  2010-01-05 14:27           ` Heiko Voigt
  1 sibling, 2 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-05  9:46 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli, Avery Pennarun

Hi,

On Tue, 5 Jan 2010, Jens Lehmann wrote:

> Am 04.01.2010 23:29, schrieb Johannes Schindelin:
> 
> > - submodules were designed with a strong emphasis on not being forced 
> >   to check them out.  But Git makes it very unconvenient to actually 
> >   check submodules out, let alone check them out at clone-time.  And 
> >   it is outright impossible to _enforce_ a submodule to be checked 
> >   out.
> 
> Absolutely. But i think the group mappings discussed by Junio and Heiko
> are a good starting point to solve that problem:
> http://thread.gmane.org/gmane.comp.version-control.git/130928/
> 
> This should be solvable by putting the necessary information into
> .gitmodules and have git clone use it.

And of course, existing Git versions will not handle it correctly.  
Judging from the rebasing-submodule patch, the next Git version will not 
handle it either.

But you're correct, one has to start _somewhere_.

> > - among other use cases, submodules are recommended for sharing 
> >   content between two different repositories. But it is part of the 
> >   design that it is _very_ easy to forget to commit, or push the 
> >   changes in the submodule that are required for the integrity of the 
> >   superproject.
> 
> Definitely (and if i got that right, svn externals have the same problem).

Yes, svn externals have that problem.  But we do not need to take the svn 
externals example more seriously than it deserves: it illustrates a valid 
use case that is not handled by submodules.  But svn externals are not 
what I would call "elegant design" either.

> What about checking for every submodule before a push in the 
> superproject that its HEAD is on a remote branch? I don't think we can 
> provide full safety here, but we could handle the 99% case of a 
> forgotten push in the submodule. This could even be done with a rather 
> simple hook (if we had a pre-push hook that is :-).

The problem with hooks is that for security reasons, every user has to 
install them in every repository herself (unless she is working on a 
machine serviced by an overzealous administrator).

> > - that use case -- sharing content between different repositories -- 
> >   is not really supported by submodules, but rather an afterthought.  
> >   This is all too obvious when you look at the restriction that the 
> >   shared content must be in a single subdirectory.
> 
> I don't see that as a problem (and it's the same with svn externals, no?).
> 
> And having worked for a long time with a RCS variant which allowed
> "projects" to contain an arbitrary list of files, i don't think this is
> a problem (but forgetting to add new files to this list really is, so
> putting everything in one directory is *much* safer IMHO).
> And: almost all files were properly grouped in directories after a decade
> of development even though that was not enforced by the scm at all.

That happens to be the case here, I agree.

But I have a use case here where the shared content is _not_ a library 
that can live in a subdirectory naturally.

> > - related are the use cases where it is desired not to have a fixed 
> >   submodule tip committed to the superproject, but always to update to 
> >   the current, say, master (like Subversion's externals).  This use 
> >   case has been wished away by the people who implemented submodules 
> >   in Git.  But reality has this nasty habit of ignoring your wishes, 
> >   does it not?
> 
> Having read up about svn externals in the meantime, what about something
> like this:
> - Add a command like "git submodule forward" (as update is already in
>   use) that takes an optional -b <branchname>. It does a fetch in the
>   submodule, then tries to fast forward (or rebase) to master or the
>   branch given and stages this commit in the superproject. This should
>   be the equivalent to doing an "svn update" in a repo with externals.
>   Or am i missing something?

Yes.  It is not the decision of the fetcher, but of the guy who adds the 
submodule to decide what it is.

> - We could also add an option to "git submodule add" to specify the
>   default branch name for forward.

That's an obvious precondition for proper always-tip-submodules.  But 
Git's core data structure, the index, does not allow for it.  _That_ is 
the difficulty, not what the user interface would look like.

> > - while it might be called clever that the submodules' metadata are 
> >   stored in .gitmodules in the superproject (and are therefore 
> >   naturally tracked with Git), the synchronization with .git/config is 
> >   performed exactly once -- when you initialize the submodule.  You 
> >   are likely to miss out on _every_ change you pulled into the 
> >   superproject.
> 
> Yes. This synchronization could be either obsoleted by only using
> .gitmodules or automated.

I start to wonder whether the insistence that .gitmodules' settings must 
be overrideable makes any sense in practice.

> > Besides, as long as there is enough reason to have out-of-Git 
> > alternative solutions such as repo, submodules deserve to be 2nd-class 
> > citizens.
> 
> I think in the long run to make submodules first class citizens the
> following submodule commands must be obsoleted by their regular git
> parts: init (by git clone), status (by git status), update (by git
> checkout), summary (already in git diff thanks to your patch) and sync
> (maybe Avery's idea of only relying on .gitmodules and not copying data
> int .git/config would solve this).

Avery's idea was to make .gitmodules overrideable in .git/config, which 
would share almost all the shortcomings I listed for the current solution.

> That would leave git submodule add, foreach and maybe a command to do 
> what svn update does for externals and another to manipulate things like 
> group membership etc..
> 
> Which reminds me of Sverre's quote from Alles Wird Git: "Yes, it is 
> possible. But it will be hard."

Yeah, it will be hard.  Especially since the fact that submodule is a 
bloated shell script has outlived its usefulness by far.  (It would be 
different if it was a nice, small, elegant script, but you have looked at 
it, so you know why I am disgusted.)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05  9:33         ` Junio C Hamano
@ 2010-01-05 10:07           ` Johannes Schindelin
  2010-01-05 11:57           ` Jens Lehmann
  1 sibling, 0 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-05 10:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jens Lehmann, Git Mailing List, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli, Avery Pennarun

Hi,

On Tue, 5 Jan 2010, Junio C Hamano wrote:

> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
> > Am 04.01.2010 23:29, schrieb Johannes Schindelin:
> > ...
> >> - among other use cases, submodules are recommended for sharing 
> >>   content between two different repositories. But it is part of the 
> >>   design that it is _very_ easy to forget to commit, or push the 
> >>   changes in the submodule that are required for the integrity of the 
> >>   superproject.
> >
> > Definitely (and if i got that right, svn externals have the same 
> > problem).
> >
> > What about checking for every submodule before a push in the 
> > superproject that its HEAD is on a remote branch? I don't think we can 
> > provide full safety here, but we could handle the 99% case of a 
> > forgotten push in the submodule. This could even be done with a rather 
> > simple hook (if we had a pre-push hook that is :-).
> 
> You don't need "pre-push" hook, if the eventual goal is to integrate this
> into "git push" proper; it can notice submodule directories, descending
> into them, check if the remote lacks the necessary commit and invoke "git
> push" via run_command() interface as needed.

That is obvious, _iff_ we make the necessary changes in core Git.  Jens' 
point was that you can do it with hooks, too.

> >> - related are the use cases where it is desired not to have a fixed 
> >>   submodule tip committed to the superproject, but always to update 
> >>   to the current, say, master (like Subversion's externals).  This 
> >>   use case has been wished away by the people who implemented 
> >>   submodules in Git.  But reality has this nasty habit of ignoring 
> >>   your wishes, does it not?
> >
> > Having read up about svn externals in the meantime, what about 
> > something like this:
> > - Add a command like "git submodule forward" (as update is already in 
> >   use) that takes an optional -b <branchname>. It does a fetch in the 
> >   submodule, then tries to fast forward (or rebase) to master or the 
> >   branch given and stages this commit in the superproject. This should 
> >   be the equivalent to doing an "svn update" in a repo with externals.  
> >   Or am i missing something?  (And we could avoid the detached HEAD in 
> >   the fast forward case by really checking out the branch in the 
> >   submodule)
> > - We could also add an option to "git submodule add" to specify the 
> >   default branch name for forward.
> 
> Instead of recording a specific submodule commit in the superproject, we
> could record a branch name (this would need a separate "gitlink" type of
> object we toyed around during the early days of submodule design) to say
> "the tip of the branch".

Yes, and it would be as limited (but in a different way) as the current 
gitlink.

You might argue that "gitlink" in its current form has not raised too many 
complaints.  But that is only because next to nobody uses submodules 
unless forced to.

> But there is a difference between a distributed system and a centralized 
> one like Subversion.  When you say "tip of the branch", you have to say 
> "which repository".  If your position is that _any_ repository will do 
> as long as the commit is at the tip of the named branch, that is like 
> saying you don't care what commit it really is, as you are free to muck 
> with branch heads in your copy of submodule repository, by adding 
> commits, or resetting new ones away.  For that matter, your 'master' 
> branch in the submodule repository may not build-on/fork-from the 
> 'master' branch in the upstream of it, so even "tip of the branch by 
> _this name_" is still fuzzy.
> 
> I am not saying "any commit will do" is necessarily a bad position to 
> take.  But people who claim they want to say "this branch" need to 
> realize what they are really saying: whatever you record in the 
> superproject commit is immaterial.  In other words, "this superproject 
> will work no matter which version of submodule is checked out at its 
> location".
> 
> Thatv actually is a very valid thing to say in some situations (Dscho
> mentioned different versions of artwork checked out as a submodule in a
> developer's superproject to build an app).  Interestingly enough, some
> people seem to think that we place too much importance on not having to
> check out submodules, but it indeed is a very natural extention of "any
> commit will do".  If the configuration you chose for your build does not
> depend on any files from there, it will truly be "any commit will do",
> including "nothing checked out there is just fine".

Come on Junio, do not insult my intelligence.

You know all too well about scenarios where a superproject tracks a 
3rd-party project which the superproject's developers do not contribute 
to.

"nothing checked out there is just fine".  Pfff.  That's ridiculous.  
You'll have to try much harder than that.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05  9:33         ` Junio C Hamano
  2010-01-05 10:07           ` Johannes Schindelin
@ 2010-01-05 11:57           ` Jens Lehmann
  2010-01-05 18:31             ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-05 11:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Am 05.01.2010 10:33, schrieb Junio C Hamano:
> So it is not necessarily a bad thing if the commit checked out in the
> submodule repository is different from what the superproject records in
> its index when a commit is made in the superproject.  We allow committing
> with local changes in regular files, while we do notify the users about
> them to avoid mistakes.  We should give the same kind of notification
> about submodules, but the "local changes" need to be thought out more
> carefully than plain files in the superproject itself.  Does uncommitted
> changes in the index of submodule repository count?  Local changes in the
> work tree files?  What about untracked files that the user might have
> forgot to add?  Should they be warned?  What about the commit in the
> submodule repository being a non-descendant of the commit recorded in the
> HEAD of the superproject's tree, resulting in a non-ff change at the
> submodule level?

Committing in the superproject with any dirty state in a submodule
should always work (same as it does with local changes in regular files),
but be visible for the user (again as local changes in regular files are).
Right now we do not show enough information about a submodule to protect
the user from accidentally throwing away changes made inside it.
The only thing we show right now are the differences between submodule
commits and what the superproject has in its index and in its commits.
Missing are:

  a) modified files
     I think these have to be shown, no matter if they are checked into
     the submodules index or not (because until they are committed, they
     can't be staged in the superproject anyway).

  b) new unignored files
     IMO these files should show up too (the superproject doesn't show
     ignored files, the submodule state shouldn't do that either). But
     OTOH i don't see a possibility for loss of data when this state is
     not shown.

  c) a detached HEAD not on any local *or* remote branch
     This can be fatal when doing a reset, revert or checkout, so it
     should be shown. Alternatively when applied on a submodule, forcing
     could be disabled to let the command fail instead of throwing stuff
     away.

  d) a detached HEAD not on any remote branch
     AFAICS this is only important for a push, and could just error out
     there.

(But i don't think it is necessary to show detailed information, just
what type of states are found in the submodule)

Concerning Dscho's remarks about the performace impact: We could control
this behavior via .gitmodules too (and later have different settings
for the submodules depending on the group the user chose). So you could
turn these checks off for repos where you don't care, saving the time to
go through the whole working directory of the submodule. But i would vote
for the default to show at least case a) and maybe even c) to follow the
principle of least surprise.


> I think "clone" has a chicken-and-egg problem.  If all of your project
> participant are expected to check out all the submodules, are expected to
> make commits in all of them, and essentially have to track everything in
> sync, then "clone" can obviously do that without asking what kind of
> participant you are [*1*].  Otherwise, you need to have some mechanism
> (e.g. "group mapping" you mentioned earlier) for the user to specify "I am
> interested in these submodules" before the actual sub-clones to happen,
> but until you clone the superproject that has some description for that
> mechanism to use, and the user to see what's available, you cannot say
> what kind of participant you are.  It has to become two-step process;
> either "clone" going interactive in the middle, or you let the clone to
> happen and then "submodule init" to express that information.

Yes, we can leave it that way for now (first "clone" and then "submodule
init <the submodules you need>"). We can migrate to the "group mapping"
functionality later (which would then allow to force certain submodules
to always be populated because they appear in every group).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05  9:46         ` Johannes Schindelin
@ 2010-01-05 12:19           ` Jens Lehmann
  2010-01-05 14:27           ` Heiko Voigt
  1 sibling, 0 replies; 45+ messages in thread
From: Jens Lehmann @ 2010-01-05 12:19 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli, Avery Pennarun

Am 05.01.2010 10:46, schrieb Johannes Schindelin:
> But I have a use case here where the shared content is _not_ a library 
> that can live in a subdirectory naturally.

Yes, we had to reorganize a major part of one project too. Heiko could
tell more about that.


>> Having read up about svn externals in the meantime, what about something
>> like this:
>> - Add a command like "git submodule forward" (as update is already in
>>   use) that takes an optional -b <branchname>. It does a fetch in the
>>   submodule, then tries to fast forward (or rebase) to master or the
>>   branch given and stages this commit in the superproject. This should
>>   be the equivalent to doing an "svn update" in a repo with externals.
>>   Or am i missing something?
> 
> Yes.  It is not the decision of the fetcher, but of the guy who adds the 
> submodule to decide what it is.
>
>> - We could also add an option to "git submodule add" to specify the
>>   default branch name for forward.
> 
> That's an obvious precondition for proper always-tip-submodules.  But 
> Git's core data structure, the index, does not allow for it.  _That_ is 
> the difficulty, not what the user interface would look like.

I have never experienced (and never had the need for) such an always-tip
scenario and therefore still seem to have difficulties to grok it. I
assume you always want to have the newest tip at /checkout/ time, not at
/commit/ time? Then my proposal would really not help you.


> I start to wonder whether the insistence that .gitmodules' settings must 
> be overrideable makes any sense in practice.

I know of none, maybe someone else can speak up here?
(And even if it is overrideable, do the settings necessarily have to be
copied into .git/config when they aren't even overridden?)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05  9:46         ` Johannes Schindelin
  2010-01-05 12:19           ` Jens Lehmann
@ 2010-01-05 14:27           ` Heiko Voigt
  2010-01-05 15:07             ` Johan Herland
                               ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Heiko Voigt @ 2010-01-05 14:27 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Lars Hjemli, Avery Pennarun

On Tue, Jan 05, 2010 at 10:46:11AM +0100, Johannes Schindelin wrote:
> On Tue, 5 Jan 2010, Jens Lehmann wrote:
> > Yes. This synchronization could be either obsoleted by only using
> > .gitmodules or automated.
> 
> I start to wonder whether the insistence that .gitmodules' settings must 
> be overrideable makes any sense in practice.

I just read this and felt the need to comment.

Yes, it definitely makes sense in practise to have it overrideable
otherwise we loose the distributed nature of git for submodules.

Imagine you fork a project and you want to work with others on a change
that involves chaning a subproject. If you can not override .gitmodules
you can only work on the central repository.

I am actually working like this in practise. I have a private clone of
all the subprojects msysgit has and commit/push locally first. Once I
sense the change is going to be useful for a wider audience I send it
upstream. This would be more uncomfortable if it is not overideable.

But I know what you mean by the general confusion about manual updates.
So how about an approach like this:

* clone will initialise all submodules in .git/config from .gitmodules

* if a change in .gitmodules happens git scans .git/config for that
  entry and in case nothing is there it syncronises the new one and
  notifies the user.

* if a change in .gitmodules happens and the entry before was the same
  in .git/config we also automatically update that entry there.

* In every other case we just leave .git/config alone.

Did I miss anything? I think you should get the idea and that it could
get rid of the confusion caused by manual .gitmodule updates.

cheers Heiko

P.S.: Additionally (for my use case) we could add a "hint mechanism"
which allows git to "guess" a new submodules address. For example in
case I have all my local clones on "git@my.server.net:<modulename>.git".
Now when a new submodule gets seen in .gitmodules it will infer the
address from the hint configuration and not take the original one from
upstream.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 14:27           ` Heiko Voigt
@ 2010-01-05 15:07             ` Johan Herland
  2010-01-05 15:30             ` Johannes Schindelin
  2010-01-05 22:37             ` Nanako Shiraishi
  2 siblings, 0 replies; 45+ messages in thread
From: Johan Herland @ 2010-01-05 15:07 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: git, Johannes Schindelin, Jens Lehmann, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Lars Hjemli, Avery Pennarun

On Tuesday 05 January 2010, Heiko Voigt wrote:
> P.S.: Additionally (for my use case) we could add a "hint mechanism"
> which allows git to "guess" a new submodules address. For example in
> case I have all my local clones on
> "git@my.server.net:<modulename>.git". Now when a new submodule gets
> seen in .gitmodules it will infer the address from the hint
> configuration and not take the original one from upstream.

This can be achieved today, if the upstream .gitmodules uses relative 
submodule URLs. I normally place super-repo and submodules in a single 
directory on the server, and use submodule URLs of the 
form "../<modulename>.git". Now, downstream developers can "git 
clone --mirror" the repos from my server, and - as long as they 
preserve the directory layout - provide their own complete server 
mirror, without editing .gitmodules. Granted, the existing submodule 
tools don't make working with relative submodule URLs particularily 
easy...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 14:27           ` Heiko Voigt
  2010-01-05 15:07             ` Johan Herland
@ 2010-01-05 15:30             ` Johannes Schindelin
  2010-01-05 22:37             ` Nanako Shiraishi
  2 siblings, 0 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-05 15:30 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Lars Hjemli, Avery Pennarun

Hi,

On Tue, 5 Jan 2010, Heiko Voigt wrote:

> On Tue, Jan 05, 2010 at 10:46:11AM +0100, Johannes Schindelin wrote:
> > On Tue, 5 Jan 2010, Jens Lehmann wrote:
> > > Yes. This synchronization could be either obsoleted by only using
> > > .gitmodules or automated.
> > 
> > I start to wonder whether the insistence that .gitmodules' settings must 
> > be overrideable makes any sense in practice.
> 
> I just read this and felt the need to comment.
> 
> Yes, it definitely makes sense in practise to have it overrideable
> otherwise we loose the distributed nature of git for submodules.

AFAICT you can use url.<base>.insteadOf for that.

Or maybe even better use a different remote for that, as you are likely 
wanting to stay up-to-date with the upstream projects even if you work on 
the stuff locally.

> But I know what you mean by the general confusion about manual updates.
> So how about an approach like this:
> 
> * clone will initialise all submodules in .git/config from .gitmodules
> 
> * if a change in .gitmodules happens git scans .git/config for that
>   entry and in case nothing is there it syncronises the new one and
>   notifies the user.
> 
> * if a change in .gitmodules happens and the entry before was the same
>   in .git/config we also automatically update that entry there.
> 
> * In every other case we just leave .git/config alone.

I'm sorry, but this is the kind of stuff I am seeing in Git: a lot of 
really complicated design with a lot of corner cases, put on top of a 
really simple and elegant design.

So I'd like to see a solution that is obviously superior by being 
plain simple.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 11:57           ` Jens Lehmann
@ 2010-01-05 18:31             ` Junio C Hamano
  2010-01-05 20:01               ` Jens Lehmann
  2010-01-05 23:02               ` Johannes Schindelin
  0 siblings, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-01-05 18:31 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Jens Lehmann <Jens.Lehmann@web.de> writes:

> The only thing we show right now are the differences between submodule
> commits and what the superproject has in its index and in its commits.
> Missing are:
>
>   a) modified files
> ...
>   b) new unignored files
>      IMO these files should show up too (the superproject doesn't show
>      ignored files, the submodule state shouldn't do that either). But
>      OTOH i don't see a possibility for loss of data when this state is
>      not shown.

I don't know if we are talking about the same scenario.  What I had in
mind was:

    cd sub
    edit new-file
    tests ok and be happy
    git commit
    cd ..
    git status
    git commit

forgetting that only you have sub/new-file in the world.  It is not loss
of data, but still bad.  Forgetting to add a new-file and committing in a
project without submodule doesn't lose data, but the resulting commit will
be seen as broken by other people.

>   c) a detached HEAD not on any local *or* remote branch
>      This can be fatal when doing a reset, revert or checkout, so it
>      should be shown. Alternatively when applied on a submodule, forcing
>      could be disabled to let the command fail instead of throwing stuff
>      away.

Sorry, I am lost.  Are you worried about "reset/revert/checkout" in the
superproject?  What destructive things do these operations do that you
consider "fatal"?  I am especially puzzled by "revert", as "commit",
"cherry-pick", and "merge" would have the same "fatal" effect as "revert",
but I don't get what "fatality" you are talking about here.

>   d) a detached HEAD not on any remote branch
>      AFAICS this is only important for a push, and could just error out
>      there.

Likewise.

>> I think "clone" has a chicken-and-egg problem.  If all of your project
>> ...
>> what kind of participant you are.  It has to become two-step process;
>> either "clone" going interactive in the middle, or you let the clone to
>> happen and then "submodule init" to express that information.
>
> Yes, we can leave it that way for now (first "clone" and then "submodule
> init <the submodules you need>"). We can migrate to the "group mapping"
> functionality later (which would then allow to force certain submodules
> to always be populated because they appear in every group).

Even with group mapping, you need to clone the superproject first, before
seeing the mapping (which I would assume comes in the superproject).  And
you need to see the mapping to decide what group you belong to.  After
that you can finally drive sub-clone to continue (e.g. I work in the
documentation area, and the group mapping has 'docs' that lets me pull in
submodules for doc/ and common/ directories, without src/ submodule --- I
can only learn that the submodules I am interested in are called 'docs' by
group name or doc/ and common/ subdirectories _after_ I get the clone of
the superproject).

I don't know if "this appears in all groups so let's always sub-clone it"
is very useful in practice, but some sort of mandatory clone/checkout
mechanism would be handy.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 18:31             ` Junio C Hamano
@ 2010-01-05 20:01               ` Jens Lehmann
  2010-01-06  1:04                 ` Junio C Hamano
  2010-01-05 23:02               ` Johannes Schindelin
  1 sibling, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-05 20:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Am 05.01.2010 19:31, schrieb Junio C Hamano:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
>>   b) new unignored files
>>      IMO these files should show up too (the superproject doesn't show
>>      ignored files, the submodule state shouldn't do that either). But
>>      OTOH i don't see a possibility for loss of data when this state is
>>      not shown.
> 
> I don't know if we are talking about the same scenario.  What I had in
> mind was:
> 
>     cd sub
>     edit new-file
>     tests ok and be happy
>     git commit
>     cd ..
>     git status
>     git commit
> 
> forgetting that only you have sub/new-file in the world.  It is not loss
> of data, but still bad.  Forgetting to add a new-file and committing in a
> project without submodule doesn't lose data, but the resulting commit will
> be seen as broken by other people.

I'm not quite sure, i was rather thinking about something like this:

    cd sub
    edit new-file
    cd ..
    <use sub/new-file here, test ok and be happy>
    git status
    git commit
    git push

git status won't show you that sub has any new files and so you won't be
reminded that you still have to add, commit and push it in the submodule
before you should even commit, let alone push in the superproject.

It is a possible breakage for other people if sub/new-file stays unnoticed.
That's IMO a good point for showing these files too.


>>   c) a detached HEAD not on any local *or* remote branch
>>      This can be fatal when doing a reset, revert or checkout, so it
>>      should be shown. Alternatively when applied on a submodule, forcing
>>      could be disabled to let the command fail instead of throwing stuff
>>      away.
> 
> Sorry, I am lost.  Are you worried about "reset/revert/checkout" in the
> superproject?  What destructive things do these operations do that you
> consider "fatal"?  I am especially puzzled by "revert", as "commit",
> "cherry-pick", and "merge" would have the same "fatal" effect as "revert",
> but I don't get what "fatality" you are talking about here.

Sorry, that was an incomplete description on my part.

My mind had already been warped into in the - hopefully not too distant -
future where these commands will be able to recurse into submodules too
(I ran into this issue recently while trying to teach git gui to revert
submodules). Right now we are blind for this state of the submodule unless
you go inside and use "git status" and friends there. And if you use e.g.
"git reset --hard" there, you can loose the commits on HEAD which aren't
on any branch.


>>   d) a detached HEAD not on any remote branch
>>      AFAICS this is only important for a push, and could just error out
>>      there.
> 
> Likewise.

This can be bad in the same way that new unignored files can be (and
there is no time travel involved this time ;-). With HEAD i meant the
submodule commit committed and about to be pushed in the supermodule
(which happens to be the HEAD of the submodule most of the time, but
not always). So you committed sub/new-file but didn't push it anywhere.
This can lead to breakage for other people even with current git. I
think push could check for this and error out, as pushing out a
referenced submodule commit which is not pushed anywhere makes no sense.

But right now i don't believe we would have to show that in the output
of git diff-files and git status, because it is only relevant at the
time when you actually want to push the superproject.


>> Yes, we can leave it that way for now (first "clone" and then "submodule
>> init <the submodules you need>"). We can migrate to the "group mapping"
>> functionality later (which would then allow to force certain submodules
>> to always be populated because they appear in every group).
> 
> Even with group mapping, you need to clone the superproject first, before
> seeing the mapping (which I would assume comes in the superproject).  And
> you need to see the mapping to decide what group you belong to.  After
> that you can finally drive sub-clone to continue (e.g. I work in the
> documentation area, and the group mapping has 'docs' that lets me pull in
> submodules for doc/ and common/ directories, without src/ submodule --- I
> can only learn that the submodules I am interested in are called 'docs' by
> group name or doc/ and common/ subdirectories _after_ I get the clone of
> the superproject).

I think we agree here.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule  working directory in git gui and gitk
  2010-01-04 22:29     ` submodules' shortcomings, was " Johannes Schindelin
                         ` (2 preceding siblings ...)
  2010-01-05  8:11       ` Jens Lehmann
@ 2010-01-05 20:38       ` Pau Garcia i Quiles
  2010-01-05 23:06         ` cmake, was Re: submodules' shortcomings Johannes Schindelin
  3 siblings, 1 reply; 45+ messages in thread
From: Pau Garcia i Quiles @ 2010-01-05 20:38 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli

Hello,

Let me pop here to support Johannes: I agree with every single point
he enumerated. Every. Single. Point.

For instance, I'd like to have a 'cmake' repository where I store all
the FindBlah.cmake modules, so that I can share them from every
repository, and not worry about users changing and committing in the
main project instead of the submodule. I can't. Subversion externals
still rule in that regard.

On Mon, Jan 4, 2010 at 11:29 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Mon, 4 Jan 2010, Jens Lehmann wrote:
>
>> Am 04.01.2010 10:44, schrieb Johannes Schindelin:
>> > The real problem is that submodules in the current form are not very
>> > well designed.
>>
>> IMVHO using the tree sha1 for a submodule seems to be the 'natural' way
>> to include another git repo. And it gives the reproducibility i expect
>> from a scm. Or am i missing something?
>
> You do remember the discussion at the Alles wird Git about the need for
> Subversion external-like behavior, right?
>
>> It looks to me as most shortcomings come from the fact that most git
>> commands tend to ignore submodules (and if they don't, like git gui and
>> gitk do now, they e.g. only show certain aspects of their state).
>
> It is not only ignoring.  It is not being able to cope with the state only
> submodules can be in (see below).
>
>> Submodules are in heavy use in our company since last year. Virtually
>> every patch i submitted for submodules came from that experience and
>> scratched an itch i or one of my colleagues had (and the situation did
>> already improve noticeably by the few things we changed). We are still
>> convinced that using submodules was the right decision. But some work
>> has still to be done to be able to use them easily and to get rid of
>> some pitfalls.
>
> Submodules may be the best way you have in Git for your workflow ATM.
> But that does not mean that the submodule design is in any way
> thought-through.
>
> Just a few shortcomings that do show up in my main project (and to a
> small extent in msysGit, as you are probably aware):
>
> - submodules were designed with a strong emphasis on not being forced to
>  check them out.  But Git makes it very unconvenient to actually check
>  submodules out, let alone check them out at clone-time.  And it is
>  outright impossible to _enforce_ a submodule to be checked out.
>
> - among other use cases, submodules are recommended for sharing content
>  between two different repositories. But it is part of the design that it
>  is _very_ easy to forget to commit, or push the changes in the submodule
>  that are required for the integrity of the superproject.
>
> - that use case -- sharing content between different repositories -- is
>  not really supported by submodules, but rather an afterthought.  This is
>  all too obvious when you look at the restriction that the shared content
>  must be in a single subdirectory.
>
> - submodules would be a perfect way to provide a fast-forward-only media
>  subdirectory that is written to by different people (artists) than to
>  the superproject (developers).  But there is no mechanism to enforce
>  shallow fetches, which means that this use case cannot be handled
>  efficiently using Git.
>
> - related are the use cases where it is desired not to have a fixed
>  submodule tip committed to the superproject, but always to update to the
>  current, say, master (like Subversion's externals).  This use case has
>  been wished away by the people who implemented submodules in Git.  But
>  reality has this nasty habit of ignoring your wishes, does it not?
>
> - there have been patches supporting rebasing submodules, i.e.
>  submodules where a "git submodule update" rebases the current branch to
>  the revision committed to the superproject rather than detaching the
>  HEAD, which everybody who ever contributed to a project with submodules
>  should agree is a useful thing. But the patches only have been discussed
>  to death, to the point where the discussion's information content was
>  converging to zero, yet the patches did not make it into Git.  (FWIW
>  this is one reason why I refuse to write patches to git-submodule.sh: I
>  refuse to let my time to be wasted like that.)
>
> - working directories with GIT_DIRs are a very different beast from single
>  files.  That alone leads to a _lot_ of problems.  The original design of
>  Git had only a couple of states for named content (AKA files): clean,
>  added, removed, modified.  The states that are possible with submodules
>  are for the most part not handled _at all_ by most Git commands (and it
>  is sometimes very hard to decide what would be the best way to handle
>  those states, either).  Just think of a submodule at a different
>  revision than committed in the superproject, with uncommitted changes,
>  ignored and unignored files, a few custom hooks, a bit of additional
>  metadata in the .git/config, and just for fun, a few temporary files in
>  .git/ which are used by the hooks.
>
> - while it might be called clever that the submodules' metadata are stored
>  in .gitmodules in the superproject (and are therefore naturally tracked
>  with Git), the synchronization with .git/config is performed exactly
>  once -- when you initialize the submodule.  You are likely to miss out
>  on _every_ change you pulled into the superproject.
>
> All in all, submodules are very clumsy to work with, and you are literally
> forced to provide scripts in the superproject to actually work with the
> submodules.
>
>> > In ths short run, we can paper over the shortcomings of the submodules
>> > by introducing a command line option "--include-submodules" to
>> > update-refresh, diff-files and diff-index, though.
>>
>> Maybe this is the way to go for now (and hopefully we can turn this
>> option on by default later because we did the right thing ;-).
>
> I do not think that --include-submodules is a good default.  It is just
> too expensive in terms of I/O even to check the status in a superproject
> with a lot of submodules.
>
> Besides, as long as there is enough reason to have out-of-Git alternative
> solutions such as repo, submodules deserve to be 2nd-class citizens.
>
> Ciao,
> Dscho
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 14:27           ` Heiko Voigt
  2010-01-05 15:07             ` Johan Herland
  2010-01-05 15:30             ` Johannes Schindelin
@ 2010-01-05 22:37             ` Nanako Shiraishi
  2010-01-05 23:13               ` Johannes Schindelin
  2 siblings, 1 reply; 45+ messages in thread
From: Nanako Shiraishi @ 2010-01-05 22:37 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Johannes Schindelin, Jens Lehmann, Git Mailing List,
	Junio C Hamano, Shawn O. Pearce, Paul Mackerras, Lars Hjemli,
	Avery Pennarun

Quoting Heiko Voigt <hvoigt@hvoigt.net>

> On Tue, Jan 05, 2010 at 10:46:11AM +0100, Johannes Schindelin wrote:
>> On Tue, 5 Jan 2010, Jens Lehmann wrote:
>> > Yes. This synchronization could be either obsoleted by only using
>> > .gitmodules or automated.
>> 
>> I start to wonder whether the insistence that .gitmodules' settings must 
>> be overrideable makes any sense in practice.
>
> I just read this and felt the need to comment.
>
> Yes, it definitely makes sense in practise to have it overrideable
> otherwise we loose the distributed nature of git for submodules.
>
> Imagine you fork a project and you want to work with others on a change
> that involves chaning a subproject. If you can not override .gitmodules
> you can only work on the central repository.
>
> I am actually working like this in practise. I have a private clone of
> all the subprojects msysgit has and commit/push locally first. Once I
> sense the change is going to be useful for a wider audience I send it
> upstream. This would be more uncomfortable if it is not overideable.
>
> But I know what you mean by the general confusion about manual updates.
> So how about an approach like this:
>
> * clone will initialise all submodules in .git/config from .gitmodules
>
> * if a change in .gitmodules happens git scans .git/config for that
>   entry and in case nothing is there it syncronises the new one and
>   notifies the user.
>
> * if a change in .gitmodules happens and the entry before was the same
>   in .git/config we also automatically update that entry there.
>
> * In every other case we just leave .git/config alone.
>
> Did I miss anything? I think you should get the idea and that it could
> get rid of the confusion caused by manual .gitmodule updates.
>
> cheers Heiko
>
> P.S.: Additionally (for my use case) we could add a "hint mechanism"
> which allows git to "guess" a new submodules address. For example in
> case I have all my local clones on "git@my.server.net:<modulename>.git".
> Now when a new submodule gets seen in .gitmodules it will infer the
> address from the hint configuration and not take the original one from
> upstream.

Thanks for sharing your thoughts. I find this discussion very interesting.

I found this other discussion in the design area enlightening.

http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47621

It was before I started using git heavily and I don't see many people who were in the discussion yet in the current thread, but I think it is worth reading.

P.S. A happy new year to everybody!

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 18:31             ` Junio C Hamano
  2010-01-05 20:01               ` Jens Lehmann
@ 2010-01-05 23:02               ` Johannes Schindelin
  1 sibling, 0 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-05 23:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jens Lehmann, Git Mailing List, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli, Avery Pennarun

Hi,

On Tue, 5 Jan 2010, Junio C Hamano wrote:

> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
> >> I think "clone" has a chicken-and-egg problem.  If all of your 
> >> project ... what kind of participant you are.  It has to become 
> >> two-step process; either "clone" going interactive in the middle, or 
> >> you let the clone to happen and then "submodule init" to express that 
> >> information.
> >
> > Yes, we can leave it that way for now (first "clone" and then 
> > "submodule init <the submodules you need>"). We can migrate to the 
> > "group mapping" functionality later (which would then allow to force 
> > certain submodules to always be populated because they appear in every 
> > group).
> 
> Even with group mapping, you need to clone the superproject first, before
> seeing the mapping (which I would assume comes in the superproject).

That's just like saying "you only see the URL first, and you have to clone 
before you see what the project is about".

So in effect you are saying that things are bad.  But you do not take the 
leap of imagination to say what we need to improve.

There are quite a number of settings which could benefit from git-clone -- 
finally -- learning to take more information than just the URL; autocrlf 
and submodules' "grouping" (which is a lousy name, by the way) being the 
most prominent examples (which the core Git developers very obviously do 
not use, otherwise the state of things would not be as sorry as it is).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* cmake, was Re: submodules' shortcomings
  2010-01-05 20:38       ` Pau Garcia i Quiles
@ 2010-01-05 23:06         ` Johannes Schindelin
  2010-01-06  1:17           ` Pau Garcia i Quiles
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-05 23:06 UTC (permalink / raw)
  To: Pau Garcia i Quiles; +Cc: Git Mailing List

Hi,

On Tue, 5 Jan 2010, Pau Garcia i Quiles wrote:

> For instance, I'd like to have a 'cmake' repository where I store all
> the FindBlah.cmake modules, so that I can share them from every
> repository, and not worry about users changing and committing in the
> main project instead of the submodule.

... which reminds me... it was you who wanted to provide a working recipe 
to compile and install CMake on msysGit, right?

What happened in the meantime?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 22:37             ` Nanako Shiraishi
@ 2010-01-05 23:13               ` Johannes Schindelin
  2010-01-07 11:04                 ` Nanako Shiraishi
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-05 23:13 UTC (permalink / raw)
  To: Nanako Shiraishi
  Cc: Heiko Voigt, Jens Lehmann, Git Mailing List, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Lars Hjemli, Avery Pennarun

Hi,

On Wed, 6 Jan 2010, Nanako Shiraishi wrote:

> I found this other discussion in the design area enlightening.
> 
> http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47621

Could you be so kind and summarize the result of the thread in something 
like 2000 characters?

I am sorry, but what with the recent trend of a precious few Git mailing 
list members using up my weekly Git time budget in less than half a day, 
just by me reading their mails, it would be nice if at least _some_ 
discussions on the list could be concise and to the point.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 20:01               ` Jens Lehmann
@ 2010-01-06  1:04                 ` Junio C Hamano
  2010-01-06 14:05                   ` Jens Lehmann
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-01-06  1:04 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Am 05.01.2010 19:31, schrieb Junio C Hamano:
>> Jens Lehmann <Jens.Lehmann@web.de> writes:
>>>   b) new unignored files
>>>      IMO these files should show up too (the superproject doesn't show
>>>      ignored files, the submodule state shouldn't do that either). But
>>>      OTOH i don't see a possibility for loss of data when this state is
>>>      not shown.
>> 
>> I don't know if we are talking about the same scenario.  What I had in
>> mind was:
>> 
>>     cd sub
>>     edit new-file
>>     tests ok and be happy
>>     git commit
>>     cd ..
>>     git status
>>     git commit
>> 
>> forgetting that only you have sub/new-file in the world.  It is not loss
>> of data, but still bad.  Forgetting to add a new-file and committing in a
>> project without submodule doesn't lose data, but the resulting commit will
>> be seen as broken by other people.
>
> I'm not quite sure, i was rather thinking about something like this:
>
>     cd sub
>     edit new-file
>     cd ..
>     <use sub/new-file here, test ok and be happy>
>     git status
>     git commit
>     git push
>
> git status won't show you that sub has any new files and so you won't be
> reminded that you still have to add, commit and push it in the submodule
> before you should even commit, let alone push in the superproject.
>
> It is a possible breakage for other people if sub/new-file stays unnoticed.
> That's IMO a good point for showing these files too.

Yeah, your "i don't see a possibility for lost of data when this state is
not shown" confused me into thinking as if you were saying it is not _too_
bad if we didn't show the information.

After all we _were_ in agreement.  We both think the user should be told
about untracked files in submodule directory when inspecting the status to
make a commit in the superproject.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: cmake, was Re: submodules' shortcomings
  2010-01-05 23:06         ` cmake, was Re: submodules' shortcomings Johannes Schindelin
@ 2010-01-06  1:17           ` Pau Garcia i Quiles
  2010-01-06  4:25             ` Miles Bader
  2010-01-06  9:24             ` Johannes Schindelin
  0 siblings, 2 replies; 45+ messages in thread
From: Pau Garcia i Quiles @ 2010-01-06  1:17 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git Mailing List

On Wed, Jan 6, 2010 at 12:06 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Tue, 5 Jan 2010, Pau Garcia i Quiles wrote:
>
>> For instance, I'd like to have a 'cmake' repository where I store all
>> the FindBlah.cmake modules, so that I can share them from every
>> repository, and not worry about users changing and committing in the
>> main project instead of the submodule.
>
> ... which reminds me... it was you who wanted to provide a working recipe
> to compile and install CMake on msysGit, right?

Right

> What happened in the meantime?

What happened is I was very busy until November. Now I've got some free time.

At this moment, what stops me from beginning this project is a simple
question: is it worth my time? From the discussion a few months ago,
it looked like it would the a second-class citizen and never replace
the existing buildsystems, so I really wonder if I should spend me
time porting git to CMake, or I should focus on other projects which
would gladly receive my contributions. If you honestly think it's
worth it, just tell me and I'll start the port to CMake immediately.

-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: cmake, was Re: submodules' shortcomings
  2010-01-06  1:17           ` Pau Garcia i Quiles
@ 2010-01-06  4:25             ` Miles Bader
  2010-01-06  9:24             ` Johannes Schindelin
  1 sibling, 0 replies; 45+ messages in thread
From: Miles Bader @ 2010-01-06  4:25 UTC (permalink / raw)
  To: Pau Garcia i Quiles; +Cc: Johannes Schindelin, Git Mailing List

Pau Garcia i Quiles <pgquiles@elpauer.org> writes:
> At this moment, what stops me from beginning this project is a simple
> question: is it worth my time? From the discussion a few months ago,
> it looked like it would the a second-class citizen and never replace
> the existing buildsystems, so I really wonder if I should spend me
> time porting git to CMake, or I should focus on other projects which
> would gladly receive my contributions. If you honestly think it's
> worth it, just tell me and I'll start the port to CMake immediately.

It sounds like it's you who want it, so aren't you the best person to
make that judgement...?  It seems very unlikely for cmake to replace
anything.

-Miles

-- 
Politeness, n. The most acceptable hypocrisy.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: cmake, was Re: submodules' shortcomings
  2010-01-06  1:17           ` Pau Garcia i Quiles
  2010-01-06  4:25             ` Miles Bader
@ 2010-01-06  9:24             ` Johannes Schindelin
  1 sibling, 0 replies; 45+ messages in thread
From: Johannes Schindelin @ 2010-01-06  9:24 UTC (permalink / raw)
  To: Pau Garcia i Quiles; +Cc: Git Mailing List

Hi,

On Wed, 6 Jan 2010, Pau Garcia i Quiles wrote:

> On Wed, Jan 6, 2010 at 12:06 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Tue, 5 Jan 2010, Pau Garcia i Quiles wrote:
> >
> >> For instance, I'd like to have a 'cmake' repository where I store all 
> >> the FindBlah.cmake modules, so that I can share them from every 
> >> repository, and not worry about users changing and committing in the 
> >> main project instead of the submodule.
> >
> > ... which reminds me... it was you who wanted to provide a working 
> > recipe to compile and install CMake on msysGit, right?
> 
> Right
> 
> > What happened in the meantime?
> 
> What happened is I was very busy until November. Now I've got some free 
> time.
> 
> At this moment, what stops me from beginning this project is a simple 
> question: is it worth my time?

Well, I thought you wanted to show that CMake is superior to what we have 
right now, and for me as msysGit maintainer, that implies that CMake 
actually works within msysGit.

Now, I do not think that it is hard to get CMake to compile in msysGit, 
but then, I just lost access to the last Windows computer, so I cannot do 
that myself.

As Miles said, it is up to you to decide whether it is so complicated, or 
whether CMake is likely not to convince, that the time balance turns out 
positive or negative.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-06  1:04                 ` Junio C Hamano
@ 2010-01-06 14:05                   ` Jens Lehmann
  2010-01-06 17:01                     ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-06 14:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Am 06.01.2010 02:04, schrieb Junio C Hamano:
> After all we _were_ in agreement.  We both think the user should be told
> about untracked files in submodule directory when inspecting the status to
> make a commit in the superproject.

Thanks. So i'll take a closer look at the diff core (but i suspect i'll
need some time until i can come up with some patches because i don't know
this part of git very well).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-06 14:05                   ` Jens Lehmann
@ 2010-01-06 17:01                     ` Junio C Hamano
  2010-01-06 17:23                       ` Nguyen Thai Ngoc Duy
  2010-01-06 18:20                       ` Jens Lehmann
  0 siblings, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-01-06 17:01 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Am 06.01.2010 02:04, schrieb Junio C Hamano:
>> After all we _were_ in agreement.  We both think the user should be told
>> about untracked files in submodule directory when inspecting the status to
>> make a commit in the superproject.
>
> Thanks. So i'll take a closer look at the diff core (but i suspect i'll
> need some time until i can come up with some patches because i don't know
> this part of git very well).

I don't see a direct connection between "the user should be told about
untracked in the submodule before committing" and diffcore.  It is just
the matter of "git status" and "git commit" running another instance of
"git status" via run_command() interface in the submodule directory, no?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule  working directory in git gui and gitk
  2010-01-06 17:01                     ` Junio C Hamano
@ 2010-01-06 17:23                       ` Nguyen Thai Ngoc Duy
  2010-01-06 17:55                         ` Junio C Hamano
  2010-01-06 18:20                       ` Jens Lehmann
  1 sibling, 1 reply; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-06 17:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jens Lehmann, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli,
	Avery Pennarun

On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
>
>
> > Am 06.01.2010 02:04, schrieb Junio C Hamano:
>  >> After all we _were_ in agreement.  We both think the user should be told
>  >> about untracked files in submodule directory when inspecting the status to
>  >> make a commit in the superproject.
>  >
>  > Thanks. So i'll take a closer look at the diff core (but i suspect i'll
>  > need some time until i can come up with some patches because i don't know
>  > this part of git very well).
>
>
> I don't see a direct connection between "the user should be told about
>  untracked in the submodule before committing" and diffcore.  It is just
>  the matter of "git status" and "git commit" running another instance of
>  "git status" via run_command() interface in the submodule directory, no?

You would need to rewrite file paths so that files in submodules are
also relative to the same directory as files in supermodule (I tried
to do that with GIT_WORK_TREE and needed to change a bit). Or you
could show each "git status" output separately, which does not look as
nice as the former in my opinion.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule  working directory in git gui and gitk
  2010-01-06 17:23                       ` Nguyen Thai Ngoc Duy
@ 2010-01-06 17:55                         ` Junio C Hamano
  2010-01-06 18:22                           ` Nguyen Thai Ngoc Duy
  2010-01-06 18:32                           ` Jens Lehmann
  0 siblings, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-01-06 17:55 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jens Lehmann, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli,
	Avery Pennarun

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
>> Jens Lehmann <Jens.Lehmann@web.de> writes:
>>
>>
>> > Am 06.01.2010 02:04, schrieb Junio C Hamano:
>>  >> After all we _were_ in agreement.  We both think the user should be told
>>  >> about untracked files in submodule directory when inspecting the status to
>>  >> make a commit in the superproject.
>>  >
>>  > Thanks. So i'll take a closer look at the diff core (but i suspect i'll
>>  > need some time until i can come up with some patches because i don't know
>>  > this part of git very well).
>>
>>
>> I don't see a direct connection between "the user should be told about
>>  untracked in the submodule before committing" and diffcore.  It is just
>>  the matter of "git status" and "git commit" running another instance of
>>  "git status" via run_command() interface in the submodule directory, no?
>
> You would need to rewrite file paths so that files in submodules are
> also relative to the same directory as files in supermodule (I tried
> to do that with GIT_WORK_TREE and needed to change a bit). Or you
> could show each "git status" output separately, which does not look as
> nice as the former in my opinion.

You could show output separately if you want, but I think that is a
separate issue.

I was envisioning that the "git status" in submodule will be run with its
recent --porcelain option, and "git status" or "git commit" would read it
to postprocess and incorporate into its own output.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-06 17:01                     ` Junio C Hamano
  2010-01-06 17:23                       ` Nguyen Thai Ngoc Duy
@ 2010-01-06 18:20                       ` Jens Lehmann
  1 sibling, 0 replies; 45+ messages in thread
From: Jens Lehmann @ 2010-01-06 18:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli, Avery Pennarun

Am 06.01.2010 18:01, schrieb Junio C Hamano:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
>> Am 06.01.2010 02:04, schrieb Junio C Hamano:
>>> After all we _were_ in agreement.  We both think the user should be told
>>> about untracked files in submodule directory when inspecting the status to
>>> make a commit in the superproject.
>>
>> Thanks. So i'll take a closer look at the diff core (but i suspect i'll
>> need some time until i can come up with some patches because i don't know
>> this part of git very well).
> 
> I don't see a direct connection between "the user should be told about
> untracked in the submodule before committing" and diffcore.  It is just
> the matter of "git status" and "git commit" running another instance of
> "git status" via run_command() interface in the submodule directory, no?

Basically yes. But i also would like to teach "git diff" (when diffing
against the working directory of the superproject) to show these
submodule states too so that git gui and gitk will display them.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule  working directory in git gui and gitk
  2010-01-06 17:55                         ` Junio C Hamano
@ 2010-01-06 18:22                           ` Nguyen Thai Ngoc Duy
  2010-01-06 18:32                           ` Jens Lehmann
  1 sibling, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-06 18:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jens Lehmann, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli,
	Avery Pennarun

On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>  > You would need to rewrite file paths so that files in submodules are
>  > also relative to the same directory as files in supermodule (I tried
>  > to do that with GIT_WORK_TREE and needed to change a bit). Or you
>  > could show each "git status" output separately, which does not look as
>  > nice as the former in my opinion.
>
>
> You could show output separately if you want, but I think that is a
>  separate issue.
>
>  I was envisioning that the "git status" in submodule will be run with its
>  recent --porcelain option, and "git status" or "git commit" would read it
>  to postprocess and incorporate into its own output.

Nice option! I had to call a few "git diff" for that just because I
did not catch up with recent Git development :-(
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-06 17:55                         ` Junio C Hamano
  2010-01-06 18:22                           ` Nguyen Thai Ngoc Duy
@ 2010-01-06 18:32                           ` Jens Lehmann
  2010-01-06 20:01                             ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Jens Lehmann @ 2010-01-06 18:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli,
	Avery Pennarun

Am 06.01.2010 18:55, schrieb Junio C Hamano:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> 
>> On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
>>> Jens Lehmann <Jens.Lehmann@web.de> writes:
>>>
>>>
>>>> Am 06.01.2010 02:04, schrieb Junio C Hamano:
>>>  >> After all we _were_ in agreement.  We both think the user should be told
>>>  >> about untracked files in submodule directory when inspecting the status to
>>>  >> make a commit in the superproject.
>>>  >
>>>  > Thanks. So i'll take a closer look at the diff core (but i suspect i'll
>>>  > need some time until i can come up with some patches because i don't know
>>>  > this part of git very well).
>>>
>>>
>>> I don't see a direct connection between "the user should be told about
>>>  untracked in the submodule before committing" and diffcore.  It is just
>>>  the matter of "git status" and "git commit" running another instance of
>>>  "git status" via run_command() interface in the submodule directory, no?
>>
>> You would need to rewrite file paths so that files in submodules are
>> also relative to the same directory as files in supermodule (I tried
>> to do that with GIT_WORK_TREE and needed to change a bit). Or you
>> could show each "git status" output separately, which does not look as
>> nice as the former in my opinion.
> 
> You could show output separately if you want, but I think that is a
> separate issue.
> 
> I was envisioning that the "git status" in submodule will be run with its
> recent --porcelain option, and "git status" or "git commit" would read it
> to postprocess and incorporate into its own output.

And i thought about printing just one line for each dirty submodule that
contains uncommitted and/or new files. I did not intend to list every
file, for the same reason a "git diff --submodule" only shows the first
line of the commit messages, not the actual differences of all changed
files in the submodule. I am not against being able to show all files
too, but i really would want to have an option to get a short output for
git gui and gitk.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-06 18:32                           ` Jens Lehmann
@ 2010-01-06 20:01                             ` Junio C Hamano
  2010-01-06 21:19                               ` Jens Lehmann
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-01-06 20:01 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli,
	Avery Pennarun

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Am 06.01.2010 18:55, schrieb Junio C Hamano:
>> I was envisioning that the "git status" in submodule will be run with its
>> recent --porcelain option, and "git status" or "git commit" would read it
>> to postprocess and incorporate into its own output.
>
> And i thought about printing just one line for each dirty submodule that
> contains uncommitted and/or new files. I did not intend to list every
> file, for the same reason a "git diff --submodule" only shows the first
> line of the commit messages, not the actual differences of all changed
> files in the submodule. I am not against being able to show all files
> too, but i really would want to have an option to get a short output for
> git gui and gitk.

I don't think what you are saying is inconsistent with "git status/commit
that reads from 'git status --porcelain' it runs in a submodule directory,
postprocesses it and incorporates it into its own output."  When the
sub-status reports changes, your "postprocess" would condense it down to
"this has a potential change that user could want to commit".  How the
dirtiness is shown is entirely up to the caller that detected that change.

Let's explain it in another way.

The original "diff" for a submodule entry was implemented by preparing a

	"Subproject commit %s\n"

line for the submodule commit recorded in the preimage and postimage, and
compare these as if they are one-line files.  When the postimage was work
tree, it looked at submodule's .git/HEAD to learn what to stuff in %s
there.

But nobody forced you to limit the check only to .git/HEAD in the
submodule.  To make the comparison richer, you could check if the
submodule directory is dirty (and we have already discussed the potential
definition of dirtiness earlier), and add "-dirty" in the string as well.
With such a change, if you make some changes to a file in the work tree of
the submodule after a clean "clone", "git diff" between the index and the
work tree would report:

	-Subproject commit 37bae10e38a66e4f1ddd5350daded00b21735126
	+Subproject commit 37bae10e38a66e4f1ddd5350daded00b21735126-dirty

The suggestion to read from "status --porcelain" that is run in the
submodule directory was about how to implement the part that determines
this "dirtiness" information, and not about how that dirtiness is
expressed in the output.  The above is an illustration that even the
traditional output format can be made aware of this submodule dirtiness
check.  "diff --submodule" can express that dirtiness information in any
way it wants.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-06 20:01                             ` Junio C Hamano
@ 2010-01-06 21:19                               ` Jens Lehmann
  0 siblings, 0 replies; 45+ messages in thread
From: Jens Lehmann @ 2010-01-06 21:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli,
	Avery Pennarun

Am 06.01.2010 21:01, schrieb Junio C Hamano:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
>> Am 06.01.2010 18:55, schrieb Junio C Hamano:
>>> I was envisioning that the "git status" in submodule will be run with its
>>> recent --porcelain option, and "git status" or "git commit" would read it
>>> to postprocess and incorporate into its own output.
>>
>> And i thought about printing just one line for each dirty submodule that
>> contains uncommitted and/or new files. I did not intend to list every
>> file, for the same reason a "git diff --submodule" only shows the first
>> line of the commit messages, not the actual differences of all changed
>> files in the submodule. I am not against being able to show all files
>> too, but i really would want to have an option to get a short output for
>> git gui and gitk.
> 
> I don't think what you are saying is inconsistent with "git status/commit
> that reads from 'git status --porcelain' it runs in a submodule directory,
> postprocesses it and incorporates it into its own output."  When the
> sub-status reports changes, your "postprocess" would condense it down to
> "this has a potential change that user could want to commit".  How the
> dirtiness is shown is entirely up to the caller that detected that change.
> 
> Let's explain it in another way.
> 
> The original "diff" for a submodule entry was implemented by preparing a
> 
> 	"Subproject commit %s\n"
> 
> line for the submodule commit recorded in the preimage and postimage, and
> compare these as if they are one-line files.  When the postimage was work
> tree, it looked at submodule's .git/HEAD to learn what to stuff in %s
> there.
> 
> But nobody forced you to limit the check only to .git/HEAD in the
> submodule.  To make the comparison richer, you could check if the
> submodule directory is dirty (and we have already discussed the potential
> definition of dirtiness earlier), and add "-dirty" in the string as well.
> With such a change, if you make some changes to a file in the work tree of
> the submodule after a clean "clone", "git diff" between the index and the
> work tree would report:
> 
> 	-Subproject commit 37bae10e38a66e4f1ddd5350daded00b21735126
> 	+Subproject commit 37bae10e38a66e4f1ddd5350daded00b21735126-dirty
> 
> The suggestion to read from "status --porcelain" that is run in the
> submodule directory was about how to implement the part that determines
> this "dirtiness" information, and not about how that dirtiness is
> expressed in the output.  The above is an illustration that even the
> traditional output format can be made aware of this submodule dirtiness
> check.  "diff --submodule" can express that dirtiness information in any
> way it wants.

I see, we seem to agree again :-)

While looking into "git status" in the last hours i became aware that
there is some infrastructure for calling "git submodule summary" (when
that is enabled via "git config status.submodulesummary"). I think this
can be extended to transfer the dirty information from "git diff
--submodule" (which can and should replace "git submodule summary" IMO)
into "git status".

Will send a patch for discussion tomorrow, i have to get some sleep now.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk
  2010-01-05 23:13               ` Johannes Schindelin
@ 2010-01-07 11:04                 ` Nanako Shiraishi
  0 siblings, 0 replies; 45+ messages in thread
From: Nanako Shiraishi @ 2010-01-07 11:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Heiko Voigt, Jens Lehmann, Git Mailing List, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Lars Hjemli, Avery Pennarun

Quoting Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Wed, 6 Jan 2010, Nanako Shiraishi wrote:
>
>> I found this other discussion in the design area enlightening.
>> 
>> http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47621
>
> Could you be so kind and summarize the result of the thread in something 
> like 2000 characters?

Sorry, but I only said "enlightening". There wasn't a conclusion that lets you stop thinking and just go ahead implementing the design specified in the thread, if that is what you are looking for.

Instead, let me tell you an example of what I found enlightening. It isn't a summary of the result. I don't think there was a *result*; otherwise somebody already would have implemented it.

I often wonder why 'git-submodule init' copies data to .git/config file. If .gitmodules file gives the default and I can use .git/config file to override it, it seems stupid to copy entries between these files. I can just keep using data from .gitmodules file until I need to override something.

Reading the thread made me realize how wrong I was. It became very clear why .gitmodules file shouldn't even be the default that is read when no entries is in .git/config file and why .git/config file should be the only thing that is used at runtime.

Unfortunately I can't summarize the reason in '2000 characters', so you need read the thread yourself if you are interested. The key concept that I was missing was that remote repositories can move or change over time, and you may want to check out and interact with a very old version of your supermodule. The .gitmodules file checked out in such a case still records old information. Treating .gitmodules file as a hint and always looking into .git/config file is a part of the fundamental solution to that problem, but I didn't even realize that such an issue existed when I read the current discussion until I found the old thread.

I think the 'git-submodule' script is mainly based on the 'three-level thing Steven Grimm suggested', but it doesn't seem to implement all the ideas in the thread yet. It gives no interactive prompt to suggest URL from 'git-submodule init' command. Neither it records which URLs have been seen with subproject.*.seen variable. But the issues that high level design must take into account looks very well thought out already.

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-01-07 11:05 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-02 15:33 RFC: display dirty submodule working directory in git gui and gitk Jens Lehmann
2010-01-04  9:44 ` Johannes Schindelin
2010-01-04 10:44   ` Heiko Voigt
2010-01-04 11:46     ` submodules, was " Johannes Schindelin
2010-01-04 18:29       ` Avery Pennarun
2010-01-04 19:14         ` Jens Lehmann
2010-01-04 17:04   ` Jens Lehmann
2010-01-04 22:29     ` submodules' shortcomings, was " Johannes Schindelin
2010-01-04 22:27       ` Shawn O. Pearce
2010-01-04 22:35         ` Avery Pennarun
2010-01-04 22:53       ` Avery Pennarun
2010-01-05  8:11       ` Jens Lehmann
2010-01-05  9:33         ` Junio C Hamano
2010-01-05 10:07           ` Johannes Schindelin
2010-01-05 11:57           ` Jens Lehmann
2010-01-05 18:31             ` Junio C Hamano
2010-01-05 20:01               ` Jens Lehmann
2010-01-06  1:04                 ` Junio C Hamano
2010-01-06 14:05                   ` Jens Lehmann
2010-01-06 17:01                     ` Junio C Hamano
2010-01-06 17:23                       ` Nguyen Thai Ngoc Duy
2010-01-06 17:55                         ` Junio C Hamano
2010-01-06 18:22                           ` Nguyen Thai Ngoc Duy
2010-01-06 18:32                           ` Jens Lehmann
2010-01-06 20:01                             ` Junio C Hamano
2010-01-06 21:19                               ` Jens Lehmann
2010-01-06 18:20                       ` Jens Lehmann
2010-01-05 23:02               ` Johannes Schindelin
2010-01-05  9:46         ` Johannes Schindelin
2010-01-05 12:19           ` Jens Lehmann
2010-01-05 14:27           ` Heiko Voigt
2010-01-05 15:07             ` Johan Herland
2010-01-05 15:30             ` Johannes Schindelin
2010-01-05 22:37             ` Nanako Shiraishi
2010-01-05 23:13               ` Johannes Schindelin
2010-01-07 11:04                 ` Nanako Shiraishi
2010-01-05 20:38       ` Pau Garcia i Quiles
2010-01-05 23:06         ` cmake, was Re: submodules' shortcomings Johannes Schindelin
2010-01-06  1:17           ` Pau Garcia i Quiles
2010-01-06  4:25             ` Miles Bader
2010-01-06  9:24             ` Johannes Schindelin
2010-01-04 17:51   ` RFC: display dirty submodule working directory in git gui and gitk Nguyen Thai Ngoc Duy
2010-01-04 18:40     ` Jens Lehmann
2010-01-04 19:05       ` Junio C Hamano
2010-01-04 19:21         ` Jens Lehmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).