* Re: Re: Relative submodule URLs
2014-08-22 16:00 ` Marc Branchaud
@ 2014-08-24 13:34 ` Heiko Voigt
2014-08-25 14:29 ` Robert Dailey
2014-08-25 13:48 ` Robert Dailey
2014-08-28 17:44 ` Marc Branchaud
2 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-24 13:34 UTC (permalink / raw)
To: Marc Branchaud; +Cc: Robert Dailey, Jonathan Nieder, Git, Jens Lehmann
Hi,
since the mail got quite long. To avoid 'tl;dr', I talk about two topics
in this mail:
* Submodule settings for default remote (complex, future)
* New --with--remote parameter for 'git submodule' (simple, now)
Depending on your interest you might want to skip the first part of the
email.
I think they are two separate topics. Please only answer to either one
and remove the other. That way we split the thread here and not mix the
two together anymore.
On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
> > I think you're on the right path. However I'd suggest something like
> > the following:
> >
> > [submodule]
> > remote = <remote_for_relative_submodules> (e.g. `upstream`)
>
> I think remote.default would be more generally useful, especially when
> working with detached checkouts.
Depends what workflow you have. Especially for submodules where the
default remote might change from branch to branch this is not
necessarily true. The following drawbacks in relation to submodules come
to my mind:
* You can not transport such configuration to the server. In case
you are developing on a branch which has changes in a forked
submodule that would be useful.
* When your development in superproject and submodule gets merged to a
stable branch (i.e. master) you also may not want that other remote
anymore. So a setting, that can be per branch, might be preferred.
* When your development gets pushed to a different remote the settings
do not change. I.e. once part of the upstream repository the
settings should possibly disappear.
* You might only want to fork a certain submodule (since thats the
only one you need to make changes in) in your branch. Then you need
this setting to be per submodule.
So to sum up a default remote setting which would be generally useful
for submodules needs the following properties (IMO):
* pushable
* per branch
* per remote
* per submodule
All of these being optional, so in case you have a local mirror,
including submodules, of some project in which you develop with your
team you might just want to set the default remote once for all
submodules.
I have not completely thought that through but the special ref idea[3]
described by Jonathan seems to make it possible to implement all these
properties.
> > [branch.<name>]
> > submoduleRemote = <remote_for_relative_submodule>
>
> If I understand correctly, you want this so that your branch can be a fork of
> only the super-repo while the submodules are not forked and so they should
> keep tracking their original repo.
>
> To me this seems to be going in the opposite direction of having branches
> recursively apply to submodules, which I think most of us want.
I disagree. While recursive branches might make sense in some
situations in most it does not. Consider a project in which you use a
library which is separately maintained. You develop on featureA in your
project and discover a bug in the submodule which you fix on a branch
(which is then tracked in the submodule). Here it does not make sense
to call your branch in the submodule featureA, since the submodule has no
knowledge at all (and should not) about this featureA.
While having said that, for a simple workflow while developing a certain
feature recursive branches make sense. Lets say as a temporary local
branch you could have that featureA branch in your submodule and just
commit any changes you need in the submodule on that branch (including
extensions and stuff). Later in the process you divide up that branch in
the submodule into cleanups, bugfixes, extensions, ... to push it
upstream for review and integration.
> A branch should fork the entire repo, including its submodules. The
> implication is that if you want to push that branch somewhere, that somewhere
> needs to be able to accept the forks of the submodules *even if those
> submodules aren't changed in your branch* because at the very least the
> branch ref has to exist in the submodules' repositories.
I disagree here as well. As the distributed nature of git allows to have
different remotes, I think its perfectly legitimate to just fork the
repositories you need to change. It should be easy to work on a
repository that is forked in its entirety, but it should also be possible
(and properly supported) to only fork some submodules. I know it does
make the situation more complex, but I think we should properly define
the goal beforehand, so we do not exclude any use-cases. Then we can go
ahead and just implement the simpler stuff (like entire repo forks)
first, while making sure we do not block the more complex use-cases.
> With absolute-path submodules, the push is a simple as creating the branch
> ref in the submodules' "home" repositories -- even if the main "somewhere"
> you're pushing to isn't one of those repositories.
>
> With relative-path submodules, the push's target repo *must* also have the
> submodules in their proper places, so that they can get updated.
> Furthermore, if you clone a repo that has relative-path submodules you *must*
> also clone the submodules.
That is not true. You can have relative submodules and just clone/fetch
some from a different remote. Its just a question of how to
specifiy/transport this information.
> Robert, I think what you'll say to this is that you still want your branch to
> track the latest submodules updates from their "home" repository. (BTW, I'm
> confused with how you're using the terms "upstream" and "origin". I'll use
> "home" to refer to the repository where everything starts from, and "fork"
> for the repository that your branch tracks.) Well, you get the updates you
> want when your branch tracks a ref in the "home" repository. But when your
> branch starts tracking a ref in another "fork" repository then you'll get the
> submodule updates in that ref's history from that "fork" repository.
>
> Once your branch is tracking the "fork" repository, if you do a pull you
> won't get any submodule updates because the fork's branch hasn't changed.
> You need to fetch (recursively) from the "home" repo to get the submodule
> updates (assuming one of the "home" repo's branches has updated its
> submodules). Then, with your branch checked out in the super-repo, if you
> check out the latest refs in your submodules git will tell you that you have
> uncommitted changes in your branch. The correct way to get submodule updates
> into your branch is to commit them. Even though you're doing a pull/rebase,
> there's nothing to rebase onto in the "fork" repository that has the updated
> submodules.
Let's please use the terms as Junio described them here[1]. Do not add
confusion by introducing new terms when we do not need them. If we need
new ones lets introduce them properly. Maybe even define them in the
glossary. I feel for some things it is needed when talking about
submodules, so we talk about the same things.
New --with--remote parameter for 'git submodule'
------------------------------------------------
While having said all that about submodule settings I think a much
much simpler start is to go ahead with a commandline setting, like
Robert proposed here[2].
For that we do not have to worry about how it can be stored,
transported, defined per submodule or on a branch, since answers to this
are given at the commandline (and current repository state).
There are still open questions about this though:
* Should the name in the submodule be 'origin' even though you
specified --with-remote=somewhere? For me its always confusing to
have the same/similar remotes named differently in different
repositories. That why I try to keep the names the same in all my
clones of repositories (i.e. for my private, github, upstream
remotes).
* When you do a 'git submodule sync --with-remote=somewhere' should
the remote be added or replaced.
My opinion on these are:
The remote should be named as in the superproject so
--with-remote=somewhere adds/replaces the remote 'somewhere' in the
submodules named on the commandline (or all in case no submodule is
specified). In case of a fresh clone of the submodule, there would be no
origin but only a remote under the new name.
Would the --with-remote feature I describe be a feasible start for you
Robert? What do others think? Is the naming of the parameter
'--with-remote' alright?
Cheers Heiko
[1] http://article.gmane.org/gmane.comp.version-control.git/255512
[2] http://article.gmane.org/gmane.comp.version-control.git/255512
[3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Re: Relative submodule URLs
2014-08-24 13:34 ` Heiko Voigt
@ 2014-08-25 14:29 ` Robert Dailey
2014-08-25 14:32 ` Robert Dailey
2014-08-26 6:28 ` Heiko Voigt
0 siblings, 2 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-25 14:29 UTC (permalink / raw)
To: Heiko Voigt; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann
On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> New --with--remote parameter for 'git submodule'
> ------------------------------------------------
>
> While having said all that about submodule settings I think a much
> much simpler start is to go ahead with a commandline setting, like
> Robert proposed here[2].
>
> For that we do not have to worry about how it can be stored,
> transported, defined per submodule or on a branch, since answers to this
> are given at the commandline (and current repository state).
>
> There are still open questions about this though:
>
> * Should the name in the submodule be 'origin' even though you
> specified --with-remote=somewhere? For me its always confusing to
> have the same/similar remotes named differently in different
> repositories. That why I try to keep the names the same in all my
> clones of repositories (i.e. for my private, github, upstream
> remotes).
>
> * When you do a 'git submodule sync --with-remote=somewhere' should
> the remote be added or replaced.
>
> My opinion on these are:
>
> The remote should be named as in the superproject so
> --with-remote=somewhere adds/replaces the remote 'somewhere' in the
> submodules named on the commandline (or all in case no submodule is
> specified). In case of a fresh clone of the submodule, there would be no
> origin but only a remote under the new name.
>
> Would the --with-remote feature I describe be a feasible start for you
> Robert? What do others think? Is the naming of the parameter
> '--with-remote' alright?
>
> Cheers Heiko
>
> [1] http://article.gmane.org/gmane.comp.version-control.git/255512
> [2] http://article.gmane.org/gmane.comp.version-control.git/255512
> [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values
Hi Heiko,
My last email response was in violation of your request to keep the
two topics separate, sorry about that. I started typing it this
weekend and completed the draft this morning, without having read this
response from you first. At this point my only intention was to start
discussion on a possible short-term solution. I realize the Git
developers are working hard on improving submodule workflow for the
long term. In addition I do not have the domain expertise to properly
make suggestions in regards to longer-term solutions, so I leave that
to you :-)
The --with-remote feature would allow me to begin using relative
submodules because:
On a per-submodule basis, I can specify the remote it will use. When I
fork a submodule and need to start tracking it, I can run `git
submodule sync --with-remote fork`, which will take my super repo's
'fork' remote, REPLACE 'origin' in the submodule with that URL, and
also redo the relative URL calculation. This is ideal since I use HTTP
at home (so I can use my proxy server to access git behind firewall at
work) and at work physically I use SSH for performance (to avoid HTTP
protocol). I also like the idea of "never" having to update my
submodule URLs again if the git server moves, domain name changes, or
whatever else.
Here is what I think would make the feature most usable. I think you
went over some of these ideas but I just want to clarify, to make sure
we're on the same page. Please correct me as needed.
1. Running `git submodule update --with-remote <name>` shall fail the
command unconditionally.
2. Using the `--with-remote` option on submodule `update` or `sync`
will fail if it detects absolute submodule URLs in .gitmodule
3. Running `git submodule update --init --with-remote <name>` shall
fail the command ONLY if a submodule is being processed that is NOT
also being initialized.
4. The behavior of git submodule's `update` or `sync` commands
combined with `--with-remote` will REPLACE or CREATE the 'origin'
remote in each submodule it is run in. We will not allow the user to
configure what the submodule remote name will end up being (I think
this is current behavior and forces good practice; I consider `origin`
an adopted standard for git, and actually wish it was more enforced
for super projects as well!)
Let me know if I've missed anything. Once we clarify requirements I'll
attempt to start work on this during my free time. I'll start by
testing this through msysgit, since I do not have linux installed, but
I have Linux Mint running in a Virtual Machine so I can test on both
platforms as needed (I don't have a lot of experience on Linux
though).
I hope you won't mind me reaching out for questions as needed, however
I will attempt to be as resourceful as possible since I know you're
all busy. Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Re: Relative submodule URLs
2014-08-25 14:29 ` Robert Dailey
@ 2014-08-25 14:32 ` Robert Dailey
2014-08-26 6:28 ` Heiko Voigt
1 sibling, 0 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-25 14:32 UTC (permalink / raw)
To: Heiko Voigt; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann
On Mon, Aug 25, 2014 at 9:29 AM, Robert Dailey <rcdailey.lists@gmail.com> wrote:
> On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
>> New --with--remote parameter for 'git submodule'
>> ------------------------------------------------
>>
>> While having said all that about submodule settings I think a much
>> much simpler start is to go ahead with a commandline setting, like
>> Robert proposed here[2].
>>
>> For that we do not have to worry about how it can be stored,
>> transported, defined per submodule or on a branch, since answers to this
>> are given at the commandline (and current repository state).
>>
>> There are still open questions about this though:
>>
>> * Should the name in the submodule be 'origin' even though you
>> specified --with-remote=somewhere? For me its always confusing to
>> have the same/similar remotes named differently in different
>> repositories. That why I try to keep the names the same in all my
>> clones of repositories (i.e. for my private, github, upstream
>> remotes).
>>
>> * When you do a 'git submodule sync --with-remote=somewhere' should
>> the remote be added or replaced.
>>
>> My opinion on these are:
>>
>> The remote should be named as in the superproject so
>> --with-remote=somewhere adds/replaces the remote 'somewhere' in the
>> submodules named on the commandline (or all in case no submodule is
>> specified). In case of a fresh clone of the submodule, there would be no
>> origin but only a remote under the new name.
>>
>> Would the --with-remote feature I describe be a feasible start for you
>> Robert? What do others think? Is the naming of the parameter
>> '--with-remote' alright?
>>
>> Cheers Heiko
>>
>> [1] http://article.gmane.org/gmane.comp.version-control.git/255512
>> [2] http://article.gmane.org/gmane.comp.version-control.git/255512
>> [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values
>
> Hi Heiko,
>
> My last email response was in violation of your request to keep the
> two topics separate, sorry about that. I started typing it this
> weekend and completed the draft this morning, without having read this
> response from you first. At this point my only intention was to start
> discussion on a possible short-term solution. I realize the Git
> developers are working hard on improving submodule workflow for the
> long term. In addition I do not have the domain expertise to properly
> make suggestions in regards to longer-term solutions, so I leave that
> to you :-)
>
> The --with-remote feature would allow me to begin using relative
> submodules because:
>
> On a per-submodule basis, I can specify the remote it will use. When I
> fork a submodule and need to start tracking it, I can run `git
> submodule sync --with-remote fork`, which will take my super repo's
> 'fork' remote, REPLACE 'origin' in the submodule with that URL, and
> also redo the relative URL calculation. This is ideal since I use HTTP
> at home (so I can use my proxy server to access git behind firewall at
> work) and at work physically I use SSH for performance (to avoid HTTP
> protocol). I also like the idea of "never" having to update my
> submodule URLs again if the git server moves, domain name changes, or
> whatever else.
>
> Here is what I think would make the feature most usable. I think you
> went over some of these ideas but I just want to clarify, to make sure
> we're on the same page. Please correct me as needed.
>
> 1. Running `git submodule update --with-remote <name>` shall fail the
> command unconditionally.
> 2. Using the `--with-remote` option on submodule `update` or `sync`
> will fail if it detects absolute submodule URLs in .gitmodule
> 3. Running `git submodule update --init --with-remote <name>` shall
> fail the command ONLY if a submodule is being processed that is NOT
> also being initialized.
> 4. The behavior of git submodule's `update` or `sync` commands
> combined with `--with-remote` will REPLACE or CREATE the 'origin'
> remote in each submodule it is run in. We will not allow the user to
> configure what the submodule remote name will end up being (I think
> this is current behavior and forces good practice; I consider `origin`
> an adopted standard for git, and actually wish it was more enforced
> for super projects as well!)
>
> Let me know if I've missed anything. Once we clarify requirements I'll
> attempt to start work on this during my free time. I'll start by
> testing this through msysgit, since I do not have linux installed, but
> I have Linux Mint running in a Virtual Machine so I can test on both
> platforms as needed (I don't have a lot of experience on Linux
> though).
>
> I hope you won't mind me reaching out for questions as needed, however
> I will attempt to be as resourceful as possible since I know you're
> all busy. Thanks.
Thought of a few more:
5. If `--with-remote` is unspecified, behavior will continue as it
currently does (I'm not clear on the precedence here of various
options, but I assume: `remote.default` first, then
`branch.name.remote`)
6. `--with-remote` will take precedence over `remote.default` and
`branch.name.remote`.
I'll add more as I think of them... Sorry for the spam.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Re: Re: Relative submodule URLs
2014-08-25 14:29 ` Robert Dailey
2014-08-25 14:32 ` Robert Dailey
@ 2014-08-26 6:28 ` Heiko Voigt
2014-08-26 15:18 ` Robert Dailey
1 sibling, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-26 6:28 UTC (permalink / raw)
To: Robert Dailey; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann
On Mon, Aug 25, 2014 at 09:29:07AM -0500, Robert Dailey wrote:
> On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > New --with--remote parameter for 'git submodule'
> > ------------------------------------------------
> >
> > While having said all that about submodule settings I think a much
> > much simpler start is to go ahead with a commandline setting, like
> > Robert proposed here[2].
> >
> > For that we do not have to worry about how it can be stored,
> > transported, defined per submodule or on a branch, since answers to this
> > are given at the commandline (and current repository state).
> >
> > There are still open questions about this though:
> >
> > * Should the name in the submodule be 'origin' even though you
> > specified --with-remote=somewhere? For me its always confusing to
> > have the same/similar remotes named differently in different
> > repositories. That why I try to keep the names the same in all my
> > clones of repositories (i.e. for my private, github, upstream
> > remotes).
> >
> > * When you do a 'git submodule sync --with-remote=somewhere' should
> > the remote be added or replaced.
> >
> > My opinion on these are:
> >
> > The remote should be named as in the superproject so
> > --with-remote=somewhere adds/replaces the remote 'somewhere' in the
> > submodules named on the commandline (or all in case no submodule is
> > specified). In case of a fresh clone of the submodule, there would be no
> > origin but only a remote under the new name.
> >
> > Would the --with-remote feature I describe be a feasible start for you
> > Robert? What do others think? Is the naming of the parameter
> > '--with-remote' alright?
> >
> > Cheers Heiko
> >
> > [1] http://article.gmane.org/gmane.comp.version-control.git/255512
> > [2] http://article.gmane.org/gmane.comp.version-control.git/255512
> > [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values
>
> Hi Heiko,
>
> My last email response was in violation of your request to keep the
> two topics separate, sorry about that. I started typing it this
> weekend and completed the draft this morning, without having read this
> response from you first.
Thats fine, no problem.
> Here is what I think would make the feature most usable. I think you
> went over some of these ideas but I just want to clarify, to make sure
> we're on the same page. Please correct me as needed.
>
> 1. Running `git submodule update --with-remote <name>` shall fail the
> command unconditionally.
I am not sure but I think you mean
git submodule update --with-remote=<name>
With the equals sign, without it you would name the submodule paths to
update. No I think that should just add the remote <name> to all
submodules that would be updated and do the normal update operation on
them (with the new remote of course).
> 2. Using the `--with-remote` option on submodule `update` or `sync`
> will fail if it detects absolute submodule URLs in .gitmodule
Yes, almost. Since you can have a mixture I suggest to only fail if the
submodules that would be processed have an absolute url in them. If
processed submodules are all relative it can go ahead.
> 3. Running `git submodule update --init --with-remote <name>` shall
> fail the command ONLY if a submodule is being processed that is NOT
> also being initialized.
No since the --init flag just tells update to initialize submodules
on-demand. It should just go ahead the same way as without
--with-remote.
> 4. The behavior of git submodule's `update` or `sync` commands
> combined with `--with-remote` will REPLACE or CREATE the 'origin'
> remote in each submodule it is run in. We will not allow the user to
> configure what the submodule remote name will end up being (I think
> this is current behavior and forces good practice; I consider `origin`
> an adopted standard for git, and actually wish it was more enforced
> for super projects as well!)
No please carefully read my email again. I specifically was describing
the opposite. --with-remote=<name> creates/replaces the remote <name> in
the submodule. I do not see a benefit in restricting the user from
creating different remote names in the submodule. I think it would be
more confusing if the remote 'origin' in the superproject does not point
to the same location as 'origin' in the submodule.
> Let me know if I've missed anything. Once we clarify requirements I'll
> attempt to start work on this during my free time. I'll start by
> testing this through msysgit, since I do not have linux installed, but
> I have Linux Mint running in a Virtual Machine so I can test on both
> platforms as needed (I don't have a lot of experience on Linux
> though).
I think it does not matter which development environment you use. In my
experience though Linux is around 30x faster when it comes to the
typical operations you do when developing git. Especially for running
the testsuite that makes a difference between a few hours and minutes.
> I hope you won't mind me reaching out for questions as needed, however
> I will attempt to be as resourceful as possible since I know you're
> all busy. Thanks.
No problem, just post here and we will see.
On Mon, Aug 25, 2014 at 09:32:27AM -0500, Robert Dailey wrote:
> Thought of a few more:
>
> 5. If `--with-remote` is unspecified, behavior will continue as it
> currently does (I'm not clear on the precedence here of various
> options, but I assume: `remote.default` first, then
> `branch.name.remote`)
Yes. And I hope that is ensured enough through the testsuite for this
case. So run it to ensure this. Have a look what kind of tests exist and
maybe even write one or two for the code you change. Thats a good start
for practise and also makes sure you do no break existing behavior.
Johan Herland also recently collected some update tests here[1]
AFAIK, remote.default was WIP and does not exist yet. So you only need
to worry
> 6. `--with-remote` will take precedence over `remote.default` and
> `branch.name.remote`.
Yes.
> I'll add more as I think of them... Sorry for the spam.
I think the code for the new commandline switch will not be too
complicated/big so I think its best if you just go ahead, write it and
then send a patch to the list once you are happy. Its common to add a
RFC if you just want some comments on your current status and do not
think its ready for inclusion yet. Expect it to go a few rounds until
everything is ironed out.
Cheers Heiko
[1] http://thread.gmane.org/gmane.comp.version-control.git/246312
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Re: Re: Relative submodule URLs
2014-08-26 6:28 ` Heiko Voigt
@ 2014-08-26 15:18 ` Robert Dailey
2014-08-26 20:34 ` Heiko Voigt
0 siblings, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-26 15:18 UTC (permalink / raw)
To: Heiko Voigt; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann
On Tue, Aug 26, 2014 at 1:28 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
>> Hi Heiko,
>>
>> My last email response was in violation of your request to keep the
>> two topics separate, sorry about that. I started typing it this
>> weekend and completed the draft this morning, without having read this
>> response from you first.
>
> Thats fine, no problem.
>
>> Here is what I think would make the feature most usable. I think you
>> went over some of these ideas but I just want to clarify, to make sure
>> we're on the same page. Please correct me as needed.
>>
>> 1. Running `git submodule update --with-remote <name>` shall fail the
>> command unconditionally.
>
> I am not sure but I think you mean
>
> git submodule update --with-remote=<name>
>
> With the equals sign, without it you would name the submodule paths to
> update. No I think that should just add the remote <name> to all
> submodules that would be updated and do the normal update operation on
> them (with the new remote of course).
I'm not sure about Linux but at least with msysgit on Windows, typing
a two-dash option (such as --with-remote) forces command line
evaluation to use the next placement parameter as the parameter for
it. I've seen this work the same way with argparse in python too. In
my experience, command line has worked that way, I'm not sure if that
is by design or not though. I never use equal signs with git commands,
never had a problem for some reason.
For example:
git rebase --onto release/1.0 head~3 head
The `--onto` option knows to use `release/1.0` as its parameter.
>> 2. Using the `--with-remote` option on submodule `update` or `sync`
>> will fail if it detects absolute submodule URLs in .gitmodule
>
> Yes, almost. Since you can have a mixture I suggest to only fail if the
> submodules that would be processed have an absolute url in them. If
> processed submodules are all relative it can go ahead.
For example if it processes 3 submodules in the following order:
1. relative
2. absolute
3. relative
Should it fail before or after processing the 3rd relative submodule?
I was thinking it would fail while trying to sync/update the 2nd one
(which is absolute) and stop before processing the 3rd.
>> 3. Running `git submodule update --init --with-remote <name>` shall
>> fail the command ONLY if a submodule is being processed that is NOT
>> also being initialized.
>
> No since the --init flag just tells update to initialize submodules
> on-demand. It should just go ahead the same way as without
> --with-remote.
But doesn't the on-demand initialization need to evaluate relative
URLs and convert them to absolute based on the .gitmodules
configuration? I thought the idea was to make `--with-remote` invalid
for initialization/sync of absolute URLs.
In other words if I did:
git submodule init --with-remote fork my-submodule-dir
and if my-submodule-dir was not relative in .gitmodules, then the
`--with-remote` flag becomes useless. We could fail silently but for
educational purposes to the user I thought we were failing in these
scenarios. Maybe I misunderstood your original intent with the
failures? Is init not doing the relative to absolute evaluation like
I'm thinking? Please correct me where I'm wrong.
>> 4. The behavior of git submodule's `update` or `sync` commands
>> combined with `--with-remote` will REPLACE or CREATE the 'origin'
>> remote in each submodule it is run in. We will not allow the user to
>> configure what the submodule remote name will end up being (I think
>> this is current behavior and forces good practice; I consider `origin`
>> an adopted standard for git, and actually wish it was more enforced
>> for super projects as well!)
>
> No please carefully read my email again. I specifically was describing
> the opposite. --with-remote=<name> creates/replaces the remote <name> in
> the submodule. I do not see a benefit in restricting the user from
> creating different remote names in the submodule. I think it would be
> more confusing if the remote 'origin' in the superproject does not point
> to the same location as 'origin' in the submodule.
Well the reason why I said it would be 'origin' is so that the
submodule knows which remote to use internally during an update. I'm
assuming 'update' uses 'origin' internally in the submodule to know
which remote to pull from. My understanding of how `git submodule
update` knows which URL to pull from is probably incorrect. I'm not
familiar on the internal mechanics of how this works. Perhaps you
could explain or send me to some reading material on it?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Re: Re: Re: Relative submodule URLs
2014-08-26 15:18 ` Robert Dailey
@ 2014-08-26 20:34 ` Heiko Voigt
0 siblings, 0 replies; 25+ messages in thread
From: Heiko Voigt @ 2014-08-26 20:34 UTC (permalink / raw)
To: Robert Dailey; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann
On Tue, Aug 26, 2014 at 10:18:48AM -0500, Robert Dailey wrote:
> On Tue, Aug 26, 2014 at 1:28 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> >> My last email response was in violation of your request to keep the
> >> two topics separate, sorry about that. I started typing it this
> >> weekend and completed the draft this morning, without having read this
> >> response from you first.
> >
> > Thats fine, no problem.
> >
> >> Here is what I think would make the feature most usable. I think you
> >> went over some of these ideas but I just want to clarify, to make sure
> >> we're on the same page. Please correct me as needed.
> >>
> >> 1. Running `git submodule update --with-remote <name>` shall fail the
> >> command unconditionally.
> >
> > I am not sure but I think you mean
> >
> > git submodule update --with-remote=<name>
> >
> > With the equals sign, without it you would name the submodule paths to
> > update. No I think that should just add the remote <name> to all
> > submodules that would be updated and do the normal update operation on
> > them (with the new remote of course).
>
> I'm not sure about Linux but at least with msysgit on Windows, typing
> a two-dash option (such as --with-remote) forces command line
> evaluation to use the next placement parameter as the parameter for
> it. I've seen this work the same way with argparse in python too. In
> my experience, command line has worked that way, I'm not sure if that
> is by design or not though. I never use equal signs with git commands,
> never had a problem for some reason.
>
> For example:
>
> git rebase --onto release/1.0 head~3 head
>
> The `--onto` option knows to use `release/1.0` as its parameter.
If you are on Window or Linux does not make a difference here. I just
realized we are quite inconsistent:
$ git grep -E -e "--\w+=<\w+>" -- Documentation/ | wc -l
226
$ git grep -E -e "--\w+ <\w+>" -- Documentation/ | wc -l
75
I would prefer the first though since that one is used more often. But
we can leave that for later, once we have some code to talk about.
> >> 2. Using the `--with-remote` option on submodule `update` or `sync`
> >> will fail if it detects absolute submodule URLs in .gitmodule
> >
> > Yes, almost. Since you can have a mixture I suggest to only fail if the
> > submodules that would be processed have an absolute url in them. If
> > processed submodules are all relative it can go ahead.
>
> For example if it processes 3 submodules in the following order:
>
> 1. relative
> 2. absolute
> 3. relative
>
> Should it fail before or after processing the 3rd relative submodule?
> I was thinking it would fail while trying to sync/update the 2nd one
> (which is absolute) and stop before processing the 3rd.
For consistency I would prefer if it fails right from the beginning in
this situation since the command can not be completed.
> >> 3. Running `git submodule update --init --with-remote <name>` shall
> >> fail the command ONLY if a submodule is being processed that is NOT
> >> also being initialized.
> >
> > No since the --init flag just tells update to initialize submodules
> > on-demand. It should just go ahead the same way as without
> > --with-remote.
>
> But doesn't the on-demand initialization need to evaluate relative
> URLs and convert them to absolute based on the .gitmodules
> configuration? I thought the idea was to make `--with-remote` invalid
> for initialization/sync of absolute URLs.
>
> In other words if I did:
>
> git submodule init --with-remote fork my-submodule-dir
>
> and if my-submodule-dir was not relative in .gitmodules, then the
> `--with-remote` flag becomes useless. We could fail silently but for
> educational purposes to the user I thought we were failing in these
> scenarios. Maybe I misunderstood your original intent with the
> failures? Is init not doing the relative to absolute evaluation like
> I'm thinking? Please correct me where I'm wrong.
Yes it does the relative to absolute evaluation. But that is a different
topic. For absolute urls in .gitmodules it should fail, but you were
talking about --init in general and in general that should not fail IMO.
So e.g.
git submodule update --init --with-remote=<name>
when all submodule urls are relative in .gitmodules and some submodules
have already been initialized should succeed.
> >> 4. The behavior of git submodule's `update` or `sync` commands
> >> combined with `--with-remote` will REPLACE or CREATE the 'origin'
> >> remote in each submodule it is run in. We will not allow the user to
> >> configure what the submodule remote name will end up being (I think
> >> this is current behavior and forces good practice; I consider `origin`
> >> an adopted standard for git, and actually wish it was more enforced
> >> for super projects as well!)
> >
> > No please carefully read my email again. I specifically was describing
> > the opposite. --with-remote=<name> creates/replaces the remote <name> in
> > the submodule. I do not see a benefit in restricting the user from
> > creating different remote names in the submodule. I think it would be
> > more confusing if the remote 'origin' in the superproject does not point
> > to the same location as 'origin' in the submodule.
>
> Well the reason why I said it would be 'origin' is so that the
> submodule knows which remote to use internally during an update. I'm
> assuming 'update' uses 'origin' internally in the submodule to know
> which remote to pull from. My understanding of how `git submodule
> update` knows which URL to pull from is probably incorrect. I'm not
> familiar on the internal mechanics of how this works. Perhaps you
> could explain or send me to some reading material on it?
Yes your assumptions are almost true. Except that submodule update does
not do a pull but a fetch (without any arguments) by default. But your
implementation could change (I actually first thought that was already
the case) the fetch in submodule update to fetch from all remotes before
updating if there is no remote specified. But I have not thought that
through. Additionally the implementation of --with-remote could be used
to specify from which remote to fetch. Regarding the reading I can only
suggest the code of git-submodule.sh to you.
I can understand that altering the 'origin' remote to also save the
remote for future fetches would help you in your case but we have to
keep other workflow in mind and not all people (me included) want only
one remote in their submodule. --with-remote should also help those
people which it does when it adds a remote under that name. Changing the
default from which a submodule is fetched by submodule update is a
separate topic for the additional configuration which we split from this
topic. I think.
Cheers Heiko
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Relative submodule URLs
2014-08-22 16:00 ` Marc Branchaud
2014-08-24 13:34 ` Heiko Voigt
@ 2014-08-25 13:48 ` Robert Dailey
2014-08-28 17:44 ` Marc Branchaud
2 siblings, 0 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-25 13:48 UTC (permalink / raw)
To: Marc Branchaud; +Cc: Jonathan Nieder, Git, Jens Lehmann, Heiko Voigt
On Fri, Aug 22, 2014 at 11:00 AM, Marc Branchaud <marcnarc@xiplink.com> wrote:
> A couple of years ago I started to work on such a thing ([1] [2] [3]), mainly
> because when we tried to change to relative submodules we got bitten when
> someone used clone's -o option so that his super-repo had no "origin" remote
> *and* his was checked out on a detached HEAD. So get_default_remote() failed
> for him.
>
> I didn't have time to complete the work -- it ended up being quite involved.
> But Junio did come up with an excellent transition plan [4] for adopting a
> default remote setting.
>
> [1] (v0) http://thread.gmane.org/gmane.comp.version-control.git/200145
> [2] (v1) http://thread.gmane.org/gmane.comp.version-control.git/201065
> [3] (v2) http://thread.gmane.org/gmane.comp.version-control.git/201306
> [4] http://article.gmane.org/gmane.comp.version-control.git/201332
>
>> I think you're on the right path. However I'd suggest something like
>> the following:
>>
>> [submodule]
>> remote = <remote_for_relative_submodules> (e.g. `upstream`)
>
> I think remote.default would be more generally useful, especially when
> working with detached checkouts.
Honestly speaking I don't use default.remote, even now that I know
about it thanks to the discussion ongoing here. The reason is that
sometimes I push my branches to origin, sometimes I push them to my
fork. I like explicit control as to which one I push to. I also sync
my git config file to dropbox and I use it on multiple projects and
platforms. I don't use the same push destination workflow on all
projects. It seems to get in the way of my workflow more than it
helps. I really only ever have two needs:
1. Push explicitly to my remote (e.g. `git push fork` or `git push origin`)
2. Push to the tracked branch (e.g. `git push`)
I'm also not sure how `push.default = simple` conflicts with the usage
of `remote.default`, since in the tracked-repo case, you must
explicitly specify the source ref to push. Is this behavior documented
somewhere?
> (For the record, I would also be happy if clone got rid of its -o option and
> "origin" became the sacred, reserved remote name (perhaps translated into
> other languages as needed) that clone always uses no matter what.)
>
>> [branch.<name>]
>> submoduleRemote = <remote_for_relative_submodule>
>
> If I understand correctly, you want this so that your branch can be a fork of
> only the super-repo while the submodules are not forked and so they should
> keep tracking their original repo.
That's correct. But this is case-by-case. Sometimes I make a change
where I want the submodule forked (rare), most times I don't.
Sometimes I can get away with pushing changes to the submodule and
worrying about it later since I know the submodule ref won't move
forward unless someone does update --remote (which isn't often or only
done as needed).
> To me this seems to be going in the opposite direction of having branches
> recursively apply to submodules, which I think most of us want.
>
> A branch should fork the entire repo, including its submodules. The
> implication is that if you want to push that branch somewhere, that somewhere
> needs to be able to accept the forks of the submodules *even if those
> submodules aren't changed in your branch* because at the very least the
> branch ref has to exist in the submodules' repositories.
There are many levels on which this can apply. When it comes to
checkouts and such, I agree. However, how will this impact *creating*
branches? What about forking? Do you expect submodule forking &
branching to be automatic as well? Based on your description, it seems
so (although a new branch doesn't necessarily have to correspond to a
new fork, unless I'm misunderstanding you). This seems difficult to
do, especially the forking part since you would need an API for this
(Github, Atlassian Stash, etc), unless you are thinking of something
clever like local/relative forks.
However the inconvenience of forking manually isn't the main reason
why I avoid forking submodules. It's the complication of pull
requests. There is no uniformity there, which is unfortunate.
Recursive pull requests are something outside the scope of git, I
realize that, but it would still be nice. However the suggestion you
make here lays the foundation for that I think.
> With absolute-path submodules, the push is a simple as creating the branch
> ref in the submodules' "home" repositories -- even if the main "somewhere"
> you're pushing to isn't one of those repositories.
>
> With relative-path submodules, the push's target repo *must* also have the
> submodules in their proper places, so that they can get updated.
> Furthermore, if you clone a repo that has relative-path submodules you *must*
> also clone the submodules.
>
> Robert, I think what you'll say to this is that you still want your branch to
> track the latest submodules updates from their "home" repository. (BTW, I'm
> confused with how you're using the terms "upstream" and "origin". I'll use
> "home" to refer to the repository where everything starts from, and "fork"
> for the repository that your branch tracks.) Well, you get the updates you
> want when your branch tracks a ref in the "home" repository. But when your
> branch starts tracking a ref in another "fork" repository then you'll get the
> submodule updates in that ref's history from that "fork" repository.
My usage of 'upstream' and 'origin' were wrong. I don't use upstream
anymore, based on the explanations I've received here. I use the
following now:
origin = my central repository (authoritative)
fork = My fork of the central repo
I like your idea of forking/branching on submodules being recursive
based on the super repo, but I just don't see how this is possible.
How would git tell github to fork, for example? And would that also
work on Stash?
> Once your branch is tracking the "fork" repository, if you do a pull you
> won't get any submodule updates because the fork's branch hasn't changed.
> You need to fetch (recursively) from the "home" repo to get the submodule
> updates (assuming one of the "home" repo's branches has updated its
> submodules). Then, with your branch checked out in the super-repo, if you
> check out the latest refs in your submodules git will tell you that you have
> uncommitted changes in your branch. The correct way to get submodule updates
> into your branch is to commit them. Even though you're doing a pull/rebase,
> there's nothing to rebase onto in the "fork" repository that has the updated
> submodules.
I like your ideas, assuming they are technically possible. They sound
like great solutions for the long term. However for now, the whole
process of working with remotes is very confusing. At first it was
complicated when it came to triangle workflow. Mostly because the way
you set push.default changes completely between the two, especially
when combined with various workflows.
Add on top of that the complexity of workflows for submodules, and it
becomes a complete mess. Maybe for you guys who actively develop and
understand git's internals it isn't so bad. However I don't have that
domain knowledge, so I only have a "user" perspective on the matter.
`remote.default` sounds nice but how do I use it based on my response
in my first paragraph above?
Thanks guys, this is great discussion.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Relative submodule URLs
2014-08-22 16:00 ` Marc Branchaud
2014-08-24 13:34 ` Heiko Voigt
2014-08-25 13:48 ` Robert Dailey
@ 2014-08-28 17:44 ` Marc Branchaud
2014-08-28 19:35 ` Heiko Voigt
2 siblings, 1 reply; 25+ messages in thread
From: Marc Branchaud @ 2014-08-28 17:44 UTC (permalink / raw)
To: Robert Dailey, Jonathan Nieder; +Cc: Git, Jens Lehmann, Heiko Voigt
Sorry for dropping out of the conversation; the last few days were a bit hectic.
Regarding recursive branching, I agree that a super-repo's branch names are
not necessarily appropriate for its submodules, and that Heiko's "simple
workflow" is a workable base to build upon. More thought is needed here, but
that's for another day.
Regarding remote.default, Robert please understand that the feature doesn't
exist, and the idea is to only serve as a fallback when the current methods
for remote selection end up resorting to the hardcoded "origin" name. More
thought is also needed here, but not today.
Both Heiko and Robert took issue with this statement of mine:
On 14-08-22 12:00 PM, Marc Branchaud wrote:
> A branch should fork the entire repo, including its submodules. The
> implication is that if you want to push that branch somewhere, that
> somewhere needs to be able to accept the forks of the submodules *even
> if those submodules aren't changed in your branch* because at the very
> least the branch ref has to exist in the submodules' repositories.
Heiko said: "It should be easy to work on a repository that is forked in its
entirety, but it should also be possible (and properly supported) to only
fork some submodules."
You're right, I overstated it when I said that the branch ref has to exist in
the unchanged submodules. The super-repo branch records which submodules it
updates, and when pushing the branch somewhere only those submodules' changes
need to be pushed.
Robert asked: "How will this impact *creating* branches? What about forking?
Do you expect submodule forking & branching to be automatic as well? ... This
seems difficult to do, especially the forking part since you would need an
API for this (Github, Atlassian Stash, etc), unless you are thinking of
something clever like local/relative forks."
I meant "fork" in the local-branch sense: The branch represents a topic in
the repository, and it should encompass the entire repository including its
submodules (just like the branch encompasses all the files in the repository,
even though the branch's commits only change a subset of those files). I
think you're talking about "fork" in the sense of setting up a mirror of a
repository. I agree that there aren't really any tools for automatically
doing that with repositories that contain relative-path submodules. I think
"git clone" could learn to do it, though.
Heiko also said this:
> On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
>> With relative-path submodules, the push's target repo *must* also have the
>> submodules in their proper places, so that they can get updated.
>> Furthermore, if you clone a repo that has relative-path submodules you
>> *must* also clone the submodules.
>
> That is not true. You can have relative submodules and just clone/fetch
> some from a different remote. Its just a question of how to
> specifiy/transport this information.
I meant that more as a general guideline than some kind of physical law.
Sure, it's possible to scatter the submodules across all sorts of hosts, but
it's not a good idea. When it comes to relative-path submodules, pushing and
fetching submodule changes in the super-repo should just involve the one
remote host (whatever way that's determined). This keeps things tractable,
because otherwise your branch's changes are scattered among many different
hosts and you end up considering weird things like "this part of the branch's
changes are on host A but this other part are on host B, so let's record that
somewhere, oh but what if host B is down when I'm trying to fetch, but I know
that host C has the changes too so why don't I just fetch what I want from
there".
It's a nightmare. It's infinitely better to treat a repository and its
relative-path submodules as an atomic unit, so that any remote that hosts the
repository also hosts the submodules. When pushing a branch with submodule
changes, expect to find those submodules on the target remote and update
them. Regardless of how the target remote is determined. Same thing for
fetching. It's just so much simpler to work this way.
So please, let's not try to specify submodule remotes per-branch or make that
info pushable. It's enough for a branch's local configuration to say that it
tracks fetch/pull refs on different remotes. The rest should flow from that.
M.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Re: Relative submodule URLs
2014-08-28 17:44 ` Marc Branchaud
@ 2014-08-28 19:35 ` Heiko Voigt
2014-08-29 15:09 ` Marc Branchaud
0 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-28 19:35 UTC (permalink / raw)
To: Marc Branchaud; +Cc: Robert Dailey, Jonathan Nieder, Git, Jens Lehmann
On Thu, Aug 28, 2014 at 01:44:18PM -0400, Marc Branchaud wrote:
> Heiko also said this:
> > On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
> >> With relative-path submodules, the push's target repo *must* also have the
> >> submodules in their proper places, so that they can get updated.
> >> Furthermore, if you clone a repo that has relative-path submodules you
> >> *must* also clone the submodules.
> >
> > That is not true. You can have relative submodules and just clone/fetch
> > some from a different remote. Its just a question of how to
> > specifiy/transport this information.
>
> I meant that more as a general guideline than some kind of physical law.
> Sure, it's possible to scatter the submodules across all sorts of hosts, but
> it's not a good idea. When it comes to relative-path submodules, pushing and
> fetching submodule changes in the super-repo should just involve the one
> remote host (whatever way that's determined). This keeps things tractable,
> because otherwise your branch's changes are scattered among many different
> hosts and you end up considering weird things like "this part of the branch's
> changes are on host A but this other part are on host B, so let's record that
> somewhere, oh but what if host B is down when I'm trying to fetch, but I know
> that host C has the changes too so why don't I just fetch what I want from
> there".
>
> It's a nightmare. It's infinitely better to treat a repository and its
> relative-path submodules as an atomic unit, so that any remote that hosts the
> repository also hosts the submodules. When pushing a branch with submodule
> changes, expect to find those submodules on the target remote and update
> them. Regardless of how the target remote is determined. Same thing for
> fetching. It's just so much simpler to work this way.
You are right, its simpler. But I would not say "better". Depending on
your project it might be "better" to just fork some submodules.
> So please, let's not try to specify submodule remotes per-branch or make that
> info pushable. It's enough for a branch's local configuration to say that it
> tracks fetch/pull refs on different remotes. The rest should flow from that.
Why not? Git is all about flexibility. Of course if you organise your
submodules in chaos you will get chaos. But consider this:
You have this big project which consists of submodule (e.g. like Android
with hundreds of submodules). Now you want to develop on something that
involves just a subset of submodules, lets say two submodules.
Now if someone just wants to publish a small change to some submodules
you are demanding to setup a mirror of *all* submodules that are in this
big project. That might not even be feasible depending on the projects
size and the remote quota. Not to speak about having to first create a
fork of hundreds of repositories. So in this situation we should support
just referring some submodules to other places.
Regarding transporting this information. If you ask someone to try out
your change it should be as simple as possible. It should be enough to
say. clone from there and checkout that branch (once recursive checkout
and fetch for submodules is in place). So here we need a way to
transport this configuration for a fork.
Yes for a small project where its feasible to simply clone all
submodules you can just say: please fork everything. But for bigger
projects thats not necessarily an option. So we should at least give the
users that option. Then its a matter of policy how you work with a
project.
I am not saying that everything for this should be implemented in the
first steps but we should keep it in mind and design everything in such
a way that it is still possible to implement such a kind of workflow
later.
Cheers Heiko
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Relative submodule URLs
2014-08-28 19:35 ` Heiko Voigt
@ 2014-08-29 15:09 ` Marc Branchaud
0 siblings, 0 replies; 25+ messages in thread
From: Marc Branchaud @ 2014-08-29 15:09 UTC (permalink / raw)
To: Heiko Voigt; +Cc: Robert Dailey, Jonathan Nieder, Git, Jens Lehmann
On 14-08-28 03:35 PM, Heiko Voigt wrote:
> On Thu, Aug 28, 2014 at 01:44:18PM -0400, Marc Branchaud wrote:
>> Heiko also said this:
>>> On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
>>>> With relative-path submodules, the push's target repo *must* also have the
>>>> submodules in their proper places, so that they can get updated.
>>>> Furthermore, if you clone a repo that has relative-path submodules you
>>>> *must* also clone the submodules.
>>>
>>> That is not true. You can have relative submodules and just clone/fetch
>>> some from a different remote. Its just a question of how to
>>> specifiy/transport this information.
>>
>> I meant that more as a general guideline than some kind of physical law.
>> Sure, it's possible to scatter the submodules across all sorts of hosts, but
>> it's not a good idea. When it comes to relative-path submodules, pushing and
>> fetching submodule changes in the super-repo should just involve the one
>> remote host (whatever way that's determined). This keeps things tractable,
>> because otherwise your branch's changes are scattered among many different
>> hosts and you end up considering weird things like "this part of the branch's
>> changes are on host A but this other part are on host B, so let's record that
>> somewhere, oh but what if host B is down when I'm trying to fetch, but I know
>> that host C has the changes too so why don't I just fetch what I want from
>> there".
>>
>> It's a nightmare. It's infinitely better to treat a repository and its
>> relative-path submodules as an atomic unit, so that any remote that hosts the
>> repository also hosts the submodules. When pushing a branch with submodule
>> changes, expect to find those submodules on the target remote and update
>> them. Regardless of how the target remote is determined. Same thing for
>> fetching. It's just so much simpler to work this way.
>
> You are right, its simpler. But I would not say "better". Depending on
> your project it might be "better" to just fork some submodules.
I think we need a clear definition of "fork" here. Are you concerned that
there's are copies of the submodule repositories that are "unused" in the
branch? (Indeed, yes, you are, as I see below.)
>> So please, let's not try to specify submodule remotes per-branch or make that
>> info pushable. It's enough for a branch's local configuration to say that it
>> tracks fetch/pull refs on different remotes. The rest should flow from that.
>
> Why not? Git is all about flexibility. Of course if you organise your
> submodules in chaos you will get chaos. But consider this:
>
> You have this big project which consists of submodule (e.g. like Android
> with hundreds of submodules). Now you want to develop on something that
> involves just a subset of submodules, lets say two submodules.
>
> Now if someone just wants to publish a small change to some submodules
> you are demanding to setup a mirror of *all* submodules that are in this
> big project. That might not even be feasible depending on the projects
> size and the remote quota. Not to speak about having to first create a
> fork of hundreds of repositories. So in this situation we should support
> just referring some submodules to other places.
I feel that this scenario is something of a straw-man. At the very least,
the developer already has a clone of all the submodules. Disk space is cheap.
(If the developer doesn't need all the submodules then I suggest that the
super-project is badly organized and should use intermediate submodules to
make it easier for developers to focus on isolated areas. That being said, I
can appreciate that repository hygiene is more art than science, and that a
large and/or long-lived project could end up with some pretty funky
configurations.)
> Regarding transporting this information. If you ask someone to try out
> your change it should be as simple as possible. It should be enough to
> say. clone from there and checkout that branch (once recursive checkout
> and fetch for submodules is in place). So here we need a way to
> transport this configuration for a fork.
You're assuming that the super-project is organized in such a way that
submodule-reliant code changes can live in isolation from the rest of the
project. That's a bit like saying you can try out a change in gitk without
having the rest of git. The super-project exists as a complete thing, and I
don't believe there are many projects where it would make sense to only try
out a change in isolation. I'm not familiar with the Android project, but
I'd be mighty impressed if changes to any arbitrary subset of its submodules
could be thoroughly tested without a full Android system.
So I don't believe the scenario you're suggesting is at all simple. The
person trying out the change can't just "clone from there" because the
submodules uanffected by the branch aren't there. At the very least this
person needs to start with "origin" clones of the super-project and all of
its required submodules, not just the ones changed in the branch. Then the
person can add the "fork" host as a remote and fetch the branch.
But it's still not that simple. Because now you're expecting that the branch
somehow has information that overrides some submodules' URLs stored in
.git/config. Coding that might be easy, I don't know, but as you say the
override information needs to be stored somewhere transportable and
branchable, like maybe a .gitmodules-fork file or something. Because
obviously different branches will have different submodule overrides.
And that makes it even more complicated! If the remote-overriding
information is stored as part of the branch then in fact that branch can't
just be merged and pushed to the "origin" host, because the submodules there
must not have their remotes overridden. So now the branch has to be changed
in order to remove the overrides. Users have to remember to do that, or
they'll break the origin's submodules. But when the branch changes suddenly
whatever people reviewed in the "fork" isn't what gets pushed back to the
"origin".
> Yes for a small project where its feasible to simply clone all
> submodules you can just say: please fork everything. But for bigger
> projects thats not necessarily an option. So we should at least give the
> users that option. Then its a matter of policy how you work with a
> project.
OK, but even if we want to eventually do both perhaps it would be wiser to
start with the simple fork-everything model. Maybe just teach "clone
--mirror" to also create relative-path submodules in their proper locations,
so that forks become easier.
> I am not saying that everything for this should be implemented in the
> first steps but we should keep it in mind and design everything in such
> a way that it is still possible to implement such a kind of workflow
> later.
I agree with using an incremental approach, but it's important to understand
where we want to go before suggesting a first step. I'm just trying to think
through the implications of what's been suggested. Please set me straight if
I'm not thinking about this properly.
M.
^ permalink raw reply [flat|nested] 25+ messages in thread