Relative submodule URLs

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Relative submodule URLs
@ 2014-08-18 18:22 Robert Dailey
  2014-08-18 20:55 ` Jonathan Nieder
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-18 18:22 UTC (permalink / raw)
  To: Git

The documentation wasn't 100% clear on this, but I'm assuming by
"remote origin", it says that the relative URL is relative to the
actual remote *named* origin (and it is not using origin as just a
general terminology).

Is there a way to specify (on a per-clone basis) which named remote
will be used to calculate the URL for submodules?

Various co-workers use the remote named "central" instead of
"upstream" and "fork" instead of "origin" (because that just makes
more sense to them and it's perfectly valid).

However if relative submodules require 'origin' to exist AND also
represent the upstream repository (in triangle workflow), then this
breaks on several levels. There is also the common issue of upstream
submodules needing to be forked as well.

Would like to see if there is a way to workaround these issues with Git 2.1.0

Thanks in advance.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-18 18:22 Relative submodule URLs Robert Dailey
@ 2014-08-18 20:55 ` Jonathan Nieder
  2014-08-19 10:24   ` Heiko Voigt
  2014-08-19 16:07   ` Robert Dailey
  0 siblings, 2 replies; 25+ messages in thread
From: Jonathan Nieder @ 2014-08-18 20:55 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git, Jens Lehmann, Heiko Voigt

Hi,

Robert Dailey wrote:

> The documentation wasn't 100% clear on this, but I'm assuming by
> "remote origin", it says that the relative URL is relative to the
> actual remote *named* origin (and it is not using origin as just a
> general terminology).

Thanks for reporting.  The remote used is the default remote that "git
fetch" without further arguments would use:

	get_default_remote () {
		curr_branch=$(git symbolic-ref -q HEAD)
		curr_branch="${curr_branch#refs/heads/}"
		origin=$(git config --get "branch.$curr_branch.remote")
		echo ${origin:-origin}
	}

The documentation is wrong.  git-fetch(1) doesn't provide a name for
this thing.  Any ideas for wording?

> Is there a way to specify (on a per-clone basis) which named remote
> will be used to calculate the URL for submodules?

Currently there isn't, short of reconfiguring the remote used by
default by "git fetch".

> Various co-workers use the remote named "central" instead of
> "upstream" and "fork" instead of "origin" (because that just makes
> more sense to them and it's perfectly valid).
>
> However if relative submodules require 'origin' to exist AND also
> represent the upstream repository (in triangle workflow), then this
> breaks on several levels.

Can you explain further?  In a triangle workflow, "git fetch" will
pull from the 'origin' remote by default and will push to the remote
named in the '[remote] pushdefault' setting (see "remote.pushdefault"
in git-config(1)).  So you can do

	[remote]
		pushDefault = whereishouldpush

and then 'git fetch' and 'git fetch --recurse-submodules' will fetch
from "origin" and 'git push' will push to the whereishouldpush remote.

It might make sense to introduce a new

	[remote]
		default = whereishouldfetch

setting to allow the name "origin" above to be replaced, too.  Is that
what you mean?

Meanwhile it is hard to fork a project that uses relative submodule
URLs without also forking the submodules (or, conversely, to fork some
of the submodules of a project that uses absolute submodule URLs).
That's a real and serious problem but I'm not sure how it relates to
the names of remotes.  My preferred fix involves teaching git to read
a refs/meta/git (or similarly named) ref when cloning a project with
submodules and let settings from .gitmodules in that ref override
.gitmodules in other branches.  Is that what you were referring to?

Curious,
Jonathan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-18 20:55 ` Jonathan Nieder
@ 2014-08-19 10:24   ` Heiko Voigt
  2014-08-19 16:15     ` Robert Dailey
  2014-08-19 16:07   ` Robert Dailey
  1 sibling, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-19 10:24 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Robert Dailey, Git, Jens Lehmann

On Mon, Aug 18, 2014 at 01:55:05PM -0700, Jonathan Nieder wrote:
> Robert Dailey wrote:
> 
> > The documentation wasn't 100% clear on this, but I'm assuming by
> > "remote origin", it says that the relative URL is relative to the
> > actual remote *named* origin (and it is not using origin as just a
> > general terminology).
> 
> Thanks for reporting.  The remote used is the default remote that "git
> fetch" without further arguments would use:
> 
> 	get_default_remote () {
> 		curr_branch=$(git symbolic-ref -q HEAD)
> 		curr_branch="${curr_branch#refs/heads/}"
> 		origin=$(git config --get "branch.$curr_branch.remote")
> 		echo ${origin:-origin}
> 	}
> 
> The documentation is wrong.  git-fetch(1) doesn't provide a name for
> this thing.  Any ideas for wording?

How about 'upstream'? Like this[1]?

Lets step back a little is this really what we want in such situation? Is one
remote really the answer here? I suppose you have relative urls in your
.gitmodules file and two remotes in you superproject right?

What you want is that the remote names in the superproject are reflected in the
submodules when you initialise and update them?

Because at the moment what you get is always a remote 'origin' in the
submodule. Even if that remote was called 'fork' in the superproject.

Maybe in the relative URLs case we should teach the clone in submodule update
to use all remotes with their names from the superproject?

Would that solve your issue?

> > Is there a way to specify (on a per-clone basis) which named remote
> > will be used to calculate the URL for submodules?
> 
> Currently there isn't, short of reconfiguring the remote used by
> default by "git fetch".

Well currently it is either the tracked remote by the currently checked out
branch or if the branch has no tracked remote configured: 'origin'.

So by configuring (or checking out a branch with) a different remote you can
choose from remote submodule are cloned. No?

> > Various co-workers use the remote named "central" instead of
> > "upstream" and "fork" instead of "origin" (because that just makes
> > more sense to them and it's perfectly valid).
> >
> > However if relative submodules require 'origin' to exist AND also
> > represent the upstream repository (in triangle workflow), then this
> > breaks on several levels.
> 
> Can you explain further?  In a triangle workflow, "git fetch" will
> pull from the 'origin' remote by default and will push to the remote
> named in the '[remote] pushdefault' setting (see "remote.pushdefault"
> in git-config(1)).  So you can do
> 
> 	[remote]
> 		pushDefault = whereishouldpush
> 
> and then 'git fetch' and 'git fetch --recurse-submodules' will fetch
> from "origin" and 'git push' will push to the whereishouldpush remote.
> 
> It might make sense to introduce a new
> 
> 	[remote]
> 		default = whereishouldfetch
> 
> setting to allow the name "origin" above to be replaced, too.  Is that
> what you mean?

I think the OP problem stems from him having a branch that does not have a
remote configured. Since they do not have 'origin' as a remote and

	git submodule update --init --recursive path/to/submodule

fails. Right?

Cheers Heiko

[1]
From: Heiko Voigt <hvoigt@hvoigt.net>
Subject: [PATCH] submodule: use 'upstream' instead of 'origin' in
 documentation

When talking about relative URL's it is ambiguous to use the term
'origin', since that might denote the default remote name 'origin'. Lets
use 'upstream' to make it more clear that the upstream repository is
meant.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
 Documentation/git-submodule.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 8e6af65..c6f82e6 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -80,15 +80,15 @@ to exist in the superproject. If <path> is not given, the
 The <path> is also used as the submodule's logical name in its
 configuration entries unless `--name` is used to specify a logical name.
 +
-<repository> is the URL of the new submodule's origin repository.
+<repository> is the URL of the new submodule's upstream repository.
 This may be either an absolute URL, or (if it begins with ./
-or ../), the location relative to the superproject's origin
+or ../), the location relative to the superproject's upstream
 repository (Please note that to specify a repository 'foo.git'
 which is located right next to a superproject 'bar.git', you'll
 have to use '../foo.git' instead of './foo.git' - as one might expect
 when following the rules for relative URLs - because the evaluation
 of relative URLs in Git is identical to that of relative directories).
-If the superproject doesn't have an origin configured
+If the superproject doesn't have any remote configured
 the superproject is its own authoritative upstream and the current
 working directory is used instead.
 +
-- 
2.1.0.rc0.52.gaa544bf

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-19 10:24   ` Heiko Voigt
@ 2014-08-19 16:15     ` Robert Dailey
  2014-08-19 16:39       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-19 16:15 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 5:24 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> I think the OP problem stems from him having a branch that does not have a
> remote configured. Since they do not have 'origin' as a remote and
>
>         git submodule update --init --recursive path/to/submodule
>
> fails. Right?

Not exactly. The issue is that there is a tug of war going on between
three specific commands (all of which utilize the tracked remote):

git fetch
git pull
git submodule update (for relative submodules)

The way I set up my remote tracking branch will be different for each
of these commands:

- git pull :: If I want convenient pulls (with rebase), I will track
my upstream branch. My pushes have to be more explicit as a tradeoff.
- git push :: If I want convenient pushes, track my origin branch.
Pulls become less convenient. My relative submodules will now need to
be forked.
- git submodule update :: I track upstream to avoid forking my
submodules. But pushes become more inconvenient.

As you can see, I feel like we're overusing the single remote setting.
Sure, we've added some global settings to set default push/pull
remotes and such, but I don't think that is a sustainable long term
solution. I like the idea of possibly introducing multiple tracking
remotes for various purposes. This adds some additional configuration
overhead (slightly), but git is already very config heavy so it might
be worth exploring. At least, this feels like a better thing for the
long term as I won't be constantly switching my tracking remote for
various purposes.

Could also explore the possibility of creating "const remotes". If we
specify a remote MUST exist for relative submodules, git can create it
for us, and fail to operate without it. It's up to the user to map
"fork" to "origin" if needed (perhaps add a `git remote clone <source>
<new remote>` to assist with this)?

Various approaches we can take, but I don't do development on Git so
I'm not sure what makes the most sense.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-19 16:15     ` Robert Dailey
@ 2014-08-19 16:39       ` Junio C Hamano
  2014-08-19 16:50         ` Robert Dailey
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2014-08-19 16:39 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Heiko Voigt, Jonathan Nieder, Git, Jens Lehmann

Robert Dailey <rcdailey.lists@gmail.com> writes:

> The way I set up my remote tracking branch will be different for each
> of these commands:
>
> - git pull :: If I want convenient pulls (with rebase), I will track
> my upstream branch. My pushes have to be more explicit as a tradeoff.

Keeping 'origin' pointing at the repository where you cloned from,
without doing anything funky (i.e. "set up my remote") would give
you convenient pulls.

> - git push :: If I want convenient pushes, track my origin branch.
> Pulls become less convenient. My relative submodules will now need to
> be forked.

You need to configure your pushes to go to a different place, if you
want them to go to a different place ;-).

Long time ago, it used to be that you have to affect the URL used in
both direction, making pulls less conveninent, but hasn't this been
made an non-issue for triangular workflows with the introduction of
remote.pushdefault long time ago?

> - git submodule update :: I track upstream to avoid forking my
> submodules. But pushes become more inconvenient.

If 'submodule update' follows the same place as 'pull' goes by
default, I would imagine that there is no issue here, no?  Am I
oversimplifying the issue by guessing that the root cause of is that
you are not using remote.pushdefault from your configuration
toolchest and instead setting the 'origin' to a wrong (i.e. where
push goes) place?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-19 16:39       ` Junio C Hamano
@ 2014-08-19 16:50         ` Robert Dailey
  2014-08-19 19:19           ` Junio C Hamano
  2014-08-19 19:30           ` Heiko Voigt
  0 siblings, 2 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-19 16:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Heiko Voigt, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 11:39 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Robert Dailey <rcdailey.lists@gmail.com> writes:
>
>> The way I set up my remote tracking branch will be different for each
>> of these commands:
>>
>> - git pull :: If I want convenient pulls (with rebase), I will track
>> my upstream branch. My pushes have to be more explicit as a tradeoff.
>
> Keeping 'origin' pointing at the repository where you cloned from,
> without doing anything funky (i.e. "set up my remote") would give
> you convenient pulls.
>
>> - git push :: If I want convenient pushes, track my origin branch.
>> Pulls become less convenient. My relative submodules will now need to
>> be forked.
>
> You need to configure your pushes to go to a different place, if you
> want them to go to a different place ;-).
>
> Long time ago, it used to be that you have to affect the URL used in
> both direction, making pulls less conveninent, but hasn't this been
> made an non-issue for triangular workflows with the introduction of
> remote.pushdefault long time ago?
>
>> - git submodule update :: I track upstream to avoid forking my
>> submodules. But pushes become more inconvenient.
>
> If 'submodule update' follows the same place as 'pull' goes by
> default, I would imagine that there is no issue here, no?  Am I
> oversimplifying the issue by guessing that the root cause of is that
> you are not using remote.pushdefault from your configuration
> toolchest and instead setting the 'origin' to a wrong (i.e. where
> push goes) place?

Maybe I'm misunderstanding something here and you can help me out.

All the reading I've done (mostly github) says that 'upstream' points
to the authoritative repository that you forked from but do not have
permissions to write to. 'origin' points to the place you push your
changes for pull requests (the fork).

Basically the workflow I use is:

- Use 'upstream' to PULL changes (latest code is obtained from the
authoritative repository)
- Use 'origin' to push your branches. Since I never modify the
branches that exist in 'upstream' on my 'origin' (I do everything
through separate personal branches).

That means if I have a branch off of 'master' named 'topic', I will
track 'upstream/master' and get latest with 'git pull'. When I'm ready
for a pull request, I do 'git push origin' (I use push.default =
simple).

According to my understanding, relative submodules work here. But not
everyone on my team uses this workflow. Sometimes they track
"origin/topic" (if we use my example again). Won't the submodule try
to find itself on the fork now?

Basically it seems like what you're advocating is that I need to
enforce a policy of "always track upstream" and "never track origin"
and "always set remote.pushdefault". Seems a bit error prone...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-19 16:50         ` Robert Dailey
@ 2014-08-19 19:19           ` Junio C Hamano
  2014-08-19 20:18             ` Robert Dailey
  2014-08-19 19:30           ` Heiko Voigt
  1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2014-08-19 19:19 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Heiko Voigt, Jonathan Nieder, Git, Jens Lehmann

Robert Dailey <rcdailey.lists@gmail.com> writes:

> Maybe I'm misunderstanding something here and you can help me out.
>
> All the reading I've done (mostly github) says that 'upstream' points
> to the authoritative repository that you forked from but do not have
> permissions to write to. 'origin' points to the place you push your
> changes for pull requests (the fork).

I do not know if that is how GitHub teaches people, but I would have
to say that these are strange phrasing.  I suspect that part of
their documentation was written long time ago, back when nobody on
the GitHub side were involved in design (let alone implementation)
of Git, and I would take it with a grain of salt.

Having said that, here is a summary of the current support for
referring to different repositories in Git:

   The word 'origin' refers to where things originate from; a place
   you push to is where things go, so it makes no sense to use that
   word to refer to the repository where you publish your work
   result.  The 'origin' may or may not be where you can push (or
   you would want to push) to.  It is where you 'pull' from to
   synchronize with the 'upstream'.

   The 'upstream' in SCM context refers to those who control a
   logically more authoritative history than you, whose work you
   derive your work from, i.e. synonymous to 'origin'.

   For people like Linus (i.e. he may pull from others but that is
   to take in changes made as derived work; he does not pull to grab
   more authoritative work), there is no need to say 'upstream'; or
   you can consider he is his own 'upstream'.

   For those who use CVS-style central repository model (i.e. they
   would pull from that single central shared repository, and push
   their work back to the same repository), 'origin' are writable to
   them and they push to them.  For people with CVS-style central
   shared repository model, their central repository is their
   'upstream' with respect to their local copy.

   Since these two classes of people need just one other repository
   to refer to, we just used 'origin' when we did the very initial
   version of "git clone", and these users can keep using that name
   to refer to that single other repository they interact with.

   The support for the triangular workflow in which you pull from
   one place and push the result of work to another, which the
   configuration variable 'remote.pushdefault' is a part of, is
   relatively a more recent development in Git.  I am not sure we
   have added an official term to our glossary to refer to the
   repository you push your work result to, but in the discussions
   we have seen phrases like 'publishing repository' used, I think.
   It must be writable by you, of course, and it may or may not be
   the same as the 'origin' repository.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-19 19:19           ` Junio C Hamano
@ 2014-08-19 20:18             ` Robert Dailey
  0 siblings, 0 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-19 20:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Heiko Voigt, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 2:19 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I do not know if that is how GitHub teaches people, but I would have
> to say that these are strange phrasing.  I suspect that part of
> their documentation was written long time ago, back when nobody on
> the GitHub side were involved in design (let alone implementation)
> of Git, and I would take it with a grain of salt.
>
> Having said that, here is a summary of the current support for
> referring to different repositories in Git:
>
> <snip>

Wow, that was a very good read. Thank you for that. I definitely have
been using the wrong terms. upstream & origin are interchangeable, yet
I was using them to represent two distinct repositories.

I think going forward my central repository will be named 'origin' and
for the name of the second, nothing has really jumped out at me yet
but it'll either be "fork" or "proxy"... "surrogate" would be nice too
if it wasn't such a long word in comparison.

I'm sure you guys will find a name for it in good time. I wonder what
Linus would suggest.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Relative submodule URLs
  2014-08-19 16:50         ` Robert Dailey
  2014-08-19 19:19           ` Junio C Hamano
@ 2014-08-19 19:30           ` Heiko Voigt
  2014-08-19 20:23             ` Robert Dailey
  1 sibling, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-19 19:30 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Junio C Hamano, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 11:50:08AM -0500, Robert Dailey wrote:
> Maybe I'm misunderstanding something here and you can help me out.
> 
> All the reading I've done (mostly github) says that 'upstream' points
> to the authoritative repository that you forked from but do not have
> permissions to write to. 'origin' points to the place you push your
> changes for pull requests (the fork).
> 
> Basically the workflow I use is:
> 
> - Use 'upstream' to PULL changes (latest code is obtained from the
> authoritative repository)
> - Use 'origin' to push your branches. Since I never modify the
> branches that exist in 'upstream' on my 'origin' (I do everything
> through separate personal branches).
> 
> That means if I have a branch off of 'master' named 'topic', I will
> track 'upstream/master' and get latest with 'git pull'. When I'm ready
> for a pull request, I do 'git push origin' (I use push.default =
> simple).
> 
> According to my understanding, relative submodules work here. But not
> everyone on my team uses this workflow. Sometimes they track
> "origin/topic" (if we use my example again). Won't the submodule try
> to find itself on the fork now?

Well the remote for the submodule is currently only calculated once,
when you do the initial

	git submodule update --init

that clones the submodule. Afterwards the fixed url is configured under
the name 'origin' in the submodule like in a normal git repository that
you have freshly cloned. Which remote is used for cloning depends on the
configured remote for the current branch or 'origin'.

When you do a fetch or push with --recurse-submodules it only executes a
'git fetch' or 'git push' without any specific remote. For fetch the
same commandline options (but only the options) are passed on.

Here it might make sense to guess the remote in the submodule somehow
and not do what fetch without remotes would do.

For the triangular workflow not much work has been done in regards to
submodule support.

But since a submodule behaves like a normal git repository maybe there
is not much work needed and we can just point to the workflow without
submodules most times. We still have to figure that out properly.

Cheers Heiko

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Relative submodule URLs
  2014-08-19 19:30           ` Heiko Voigt
@ 2014-08-19 20:23             ` Robert Dailey
  2014-08-19 20:57               ` Heiko Voigt
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-19 20:23 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Junio C Hamano, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 2:30 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> Well the remote for the submodule is currently only calculated once,
> when you do the initial
>
>         git submodule update --init
>
> that clones the submodule. Afterwards the fixed url is configured under
> the name 'origin' in the submodule like in a normal git repository that
> you have freshly cloned. Which remote is used for cloning depends on the
> configured remote for the current branch or 'origin'.
>
> When you do a fetch or push with --recurse-submodules it only executes a
> 'git fetch' or 'git push' without any specific remote. For fetch the
> same commandline options (but only the options) are passed on.
>
> Here it might make sense to guess the remote in the submodule somehow
> and not do what fetch without remotes would do.
>
> For the triangular workflow not much work has been done in regards to
> submodule support.
>
> But since a submodule behaves like a normal git repository maybe there
> is not much work needed and we can just point to the workflow without
> submodules most times. We still have to figure that out properly.

Maybe then the only thing we need is a --with-remote option for git
submodule? ::

git submodule update --init --with-remote myremote

The --with-remote option would be a NOOP if it's already initialized,
as you say. But I could create an alias for this as needed to make
sure it is always specified.

That way, just in case someone cloned with their fork (in which case
'origin' would not be pointing in the right place), they could tell it
to use `myremote`. This is really the only strange case to handle
right now (people that clone their forks instead of the actual
upstream/central repository).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: Relative submodule URLs
  2014-08-19 20:23             ` Robert Dailey
@ 2014-08-19 20:57               ` Heiko Voigt
  2014-08-20 13:18                 ` Robert Dailey
  0 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-19 20:57 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Junio C Hamano, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 03:23:36PM -0500, Robert Dailey wrote:
> On Tue, Aug 19, 2014 at 2:30 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > Well the remote for the submodule is currently only calculated once,
> > when you do the initial
> >
> >         git submodule update --init
> >
> > that clones the submodule. Afterwards the fixed url is configured under
> > the name 'origin' in the submodule like in a normal git repository that
> > you have freshly cloned. Which remote is used for cloning depends on the
> > configured remote for the current branch or 'origin'.
> >
> > When you do a fetch or push with --recurse-submodules it only executes a
> > 'git fetch' or 'git push' without any specific remote. For fetch the
> > same commandline options (but only the options) are passed on.
> >
> > Here it might make sense to guess the remote in the submodule somehow
> > and not do what fetch without remotes would do.
> >
> > For the triangular workflow not much work has been done in regards to
> > submodule support.
> >
> > But since a submodule behaves like a normal git repository maybe there
> > is not much work needed and we can just point to the workflow without
> > submodules most times. We still have to figure that out properly.
> 
> Maybe then the only thing we need is a --with-remote option for git
> submodule? ::
> 
> git submodule update --init --with-remote myremote
> 
> The --with-remote option would be a NOOP if it's already initialized,
> as you say. But I could create an alias for this as needed to make
> sure it is always specified.

I would actually error out when specified in already cloned state.
Because otherwise the user might expect the remote to be updated.

Since we are currently busy implementing recursive fetch and checkout I have
added that to our ideas list[1] so we do not forget about it.

In the meantime you can either use the branch.<name>.remote
configuration to define a remote to use or just use 'origin'.

Cheers Heiko

[1] https://github.com/jlehmann/git-submod-enhancements/wiki#add-with-remote--switch-to-submodule-update

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: Relative submodule URLs
  2014-08-19 20:57               ` Heiko Voigt
@ 2014-08-20 13:18                 ` Robert Dailey
  2014-08-21 12:37                   ` Heiko Voigt
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-20 13:18 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Junio C Hamano, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 19, 2014 at 3:57 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> I would actually error out when specified in already cloned state.
> Because otherwise the user might expect the remote to be updated.
>
> Since we are currently busy implementing recursive fetch and checkout I have
> added that to our ideas list[1] so we do not forget about it.
>
> In the meantime you can either use the branch.<name>.remote
> configuration to define a remote to use or just use 'origin'.
>
> Cheers Heiko
>
> [1] https://github.com/jlehmann/git-submod-enhancements/wiki#add-with-remote--switch-to-submodule-update

Thanks Heiko.

I would offer to help implement this for you, if you find it to be a
good idea, but I've never done git development before and based on
what I've seen it seems like you need to know at least 2-3 languages
to contribute: bash, perl, C++. I know C++ & Python but I don't know
perl or bash scripting language.

What would it take to help you guys out? It's easy to complain & file
bugs but as a developer I feel like I should offer more, if it suits
you.

Let me know I'm happy to help with anything. Thanks again!!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: Relative submodule URLs
  2014-08-20 13:18                 ` Robert Dailey
@ 2014-08-21 12:37                   ` Heiko Voigt
  0 siblings, 0 replies; 25+ messages in thread
From: Heiko Voigt @ 2014-08-21 12:37 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Junio C Hamano, Jonathan Nieder, Git, Jens Lehmann

On Wed, Aug 20, 2014 at 08:18:12AM -0500, Robert Dailey wrote:
> On Tue, Aug 19, 2014 at 3:57 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > I would actually error out when specified in already cloned state.
> > Because otherwise the user might expect the remote to be updated.
> >
> > Since we are currently busy implementing recursive fetch and checkout I have
> > added that to our ideas list[1] so we do not forget about it.
> >
> > In the meantime you can either use the branch.<name>.remote
> > configuration to define a remote to use or just use 'origin'.
> >
> > Cheers Heiko
> >
> > [1] https://github.com/jlehmann/git-submod-enhancements/wiki#add-with-remote--switch-to-submodule-update
> 
> Thanks Heiko.
> 
> I would offer to help implement this for you, if you find it to be a
> good idea, but I've never done git development before and based on
> what I've seen it seems like you need to know at least 2-3 languages
> to contribute: bash, perl, C++. I know C++ & Python but I don't know
> perl or bash scripting language.
> 
> What would it take to help you guys out? It's easy to complain & file
> bugs but as a developer I feel like I should offer more, if it suits
> you.

For this particular case shell scripting should be sufficient. And it
should not take too much time. Have a look at the git-submodule.sh
script in the repository. That is the one implementing the git submodule
command.

Additionally you need to extend the documentation and write a test or
two. Writing a test is also done in shell script. The documentation[1] is
in asciidoc which is pretty self explanatory.

The test should probably go into t/t7406-submodule-update.sh and, as
Phil pointed out, in t7403-submodule-sync.sh).

Also make sure to read the shell scripting part in
Documentation/CodingGuidelines and as a general rule: Keep close to the
style you find in the file. And when you are ready to send a patch:
Documentation/SubmittingPatches.

If you are happy but unsure about anything just send a patch with your
implementation (CC me and everyone involved) and we will discuss it here
on the list.

Cheers Heiko

[1] Documentation/git-submodule.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-18 20:55 ` Jonathan Nieder
  2014-08-19 10:24   ` Heiko Voigt
@ 2014-08-19 16:07   ` Robert Dailey
  2014-08-22 16:00     ` Marc Branchaud
  1 sibling, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-19 16:07 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git, Jens Lehmann, Heiko Voigt

On Mon, Aug 18, 2014 at 3:55 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Thanks for reporting.  The remote used is the default remote that "git
> fetch" without further arguments would use:
>
>         get_default_remote () {
>                 curr_branch=$(git symbolic-ref -q HEAD)
>                 curr_branch="${curr_branch#refs/heads/}"
>                 origin=$(git config --get "branch.$curr_branch.remote")
>                 echo ${origin:-origin}
>         }
>
> The documentation is wrong.  git-fetch(1) doesn't provide a name for
> this thing.  Any ideas for wording?

I guess a good start would be to call it the "tracked remote" instead
of "remote origin". The word "tracked" here makes it very obvious that
if I have a remote tracking branch setup, it will use the remote
portion of that configuration.

The real question is, how will `git submodule update` function if a
tracking remote is not configured? Will it fail with some useful error
message? I don't like the idea of it defaulting to "self remote" mode,
where it will be relative to my repo directory. That seems like it
would fail is subtle ways in a worst-case scenario (if I did by
happenstance have a bare repository cloned up one directory level for
other reasons).

> Currently there isn't, short of reconfiguring the remote used by
> default by "git fetch".

I wish there was a way to specify the remote on a per-branch basis
separately from the tracking branch. I read a while back that someone
proposed some changes to git to support decentralized tracking
(concept of an upstream tracking branch and a separate one for origin,
i think). If that were implemented, then relative submodules could
utilize the 'upstream' remote by default for each branch, which would
provide more predictable default behavior. Right now most people on my
team would not be aware that if they tracked a branch on their fork,
they would subsequently need to fork the submodules to that same
remote.

>> Various co-workers use the remote named "central" instead of
>> "upstream" and "fork" instead of "origin" (because that just makes
>> more sense to them and it's perfectly valid).
>>
>> However if relative submodules require 'origin' to exist AND also
>> represent the upstream repository (in triangle workflow), then this
>> breaks on several levels.
>
> Can you explain further?  In a triangle workflow, "git fetch" will
> pull from the 'origin' remote by default and will push to the remote
> named in the '[remote] pushdefault' setting (see "remote.pushdefault"
> in git-config(1)).  So you can do
>
>         [remote]
>                 pushDefault = whereishouldpush
>
> and then 'git fetch' and 'git fetch --recurse-submodules' will fetch
> from "origin" and 'git push' will push to the whereishouldpush remote.

I didn't know about this option, seems useful.

A common workflow that we use on my team is to set the tracking branch
to 'upstream' for convenient pulls with rebase. This means a feature
branch of mine can track its parent branch on 'upstream', so that when
other pull requests get merged in on the remote repo branch, I can
just do `git pull` and my feature branch rebases onto the latest of
that parent branch.

Cases like these would work with relative submodules because
'upstream' is the tracked remote (and most of the time we don't want
to fork submodules). However sometimes I like to track the same pushed
branch on origin (my fork), especially when it is up for pull request.
In these cases, my submodule update will fail because I didn't fork my
submodules when I changed my tracking branch. Is this correct?

"breaks on several levels" was basically my way of saying that various
workflow choices will break when you introduce submodules. One of the
beautiful things about Git is that it allows everyone to choose their
own workflow. But submodules seem to prevent that to some degree. I
think addressing relative submodule usability issues is the best
approach for the long term as they feel more sustainable and scalable.
It's an absolute pain to move a submodule URL, I think we've all
experienced it. It's even harder for me because I'm the go-to at work
for help with Git. Most people that aren't advanced with Git will not
know what to do without a ton of reading & such.

> It might make sense to introduce a new
>
>         [remote]
>                 default = whereishouldfetch
>
> setting to allow the name "origin" above to be replaced, too.  Is that
> what you mean?

I think you're on the right path. However I'd suggest something like
the following:

[submodule]
    remote = <remote_for_relative_submodules> (e.g. `upstream`)

[branch.<name>]
    submoduleRemote = <remote_for_relative_submodule>

Above, `submodule.remote` is the 'default' remote used by all relative
submodules on all branches. If unspecified, it defaults to
`branch.<name>.remote` as it currently behaves.

`branch.<name>.submoduleRemote` is an override for `submodule.remote`.

Basically if you consider this scenario:

[branch.myfoo]
   remote = origin
   submoduleRemote = upstream

I can track an origin branch, but the submodule will refer to 'upstream'.

I can optionally set `submodule.remote` in my global config so I don't
need to set the submodule remote on each branch. This is useful if
most branches I track are on origin instead of upstream.

I'm sure I'm missing other important aspects of Git that these options
conflict with, and perhaps 2 additional options may be too redundant.
It's just some food for thought.

> Meanwhile it is hard to fork a project that uses relative submodule
> URLs without also forking the submodules (or, conversely, to fork some
> of the submodules of a project that uses absolute submodule URLs).
> That's a real and serious problem but I'm not sure how it relates to
> the names of remotes.  My preferred fix involves teaching git to read
> a refs/meta/git (or similarly named) ref when cloning a project with
> submodules and let settings from .gitmodules in that ref override
> .gitmodules in other branches.  Is that what you were referring to?

Could you explain this a bit more? What is refs/meta/git? Never heard
of that one. Does that have to be done while cloning or can an
existing repository be configured? I'm interested in your idea but it
sounds confusing to me.

Thanks for taking the time to brainstorm with me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-19 16:07   ` Robert Dailey
@ 2014-08-22 16:00     ` Marc Branchaud
  2014-08-24 13:34       ` Heiko Voigt
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Marc Branchaud @ 2014-08-22 16:00 UTC (permalink / raw)
  To: Robert Dailey, Jonathan Nieder; +Cc: Git, Jens Lehmann, Heiko Voigt

On 14-08-19 12:07 PM, Robert Dailey wrote:
> On Mon, Aug 18, 2014 at 3:55 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> Thanks for reporting.  The remote used is the default remote that "git
>> fetch" without further arguments would use:
>>
>>         get_default_remote () {
>>                 curr_branch=$(git symbolic-ref -q HEAD)
>>                 curr_branch="${curr_branch#refs/heads/}"
>>                 origin=$(git config --get "branch.$curr_branch.remote")
>>                 echo ${origin:-origin}
>>         }
>>
>> The documentation is wrong.  git-fetch(1) doesn't provide a name for
>> this thing.  Any ideas for wording?
> 
> I guess a good start would be to call it the "tracked remote" instead
> of "remote origin". The word "tracked" here makes it very obvious that
> if I have a remote tracking branch setup, it will use the remote
> portion of that configuration.
> 
> The real question is, how will `git submodule update` function if a
> tracking remote is not configured? Will it fail with some useful error
> message? I don't like the idea of it defaulting to "self remote" mode,
> where it will be relative to my repo directory. That seems like it
> would fail is subtle ways in a worst-case scenario (if I did by
> happenstance have a bare repository cloned up one directory level for
> other reasons).
> 
>> Currently there isn't, short of reconfiguring the remote used by
>> default by "git fetch".
> 
> I wish there was a way to specify the remote on a per-branch basis
> separately from the tracking branch. I read a while back that someone
> proposed some changes to git to support decentralized tracking
> (concept of an upstream tracking branch and a separate one for origin,
> i think). If that were implemented, then relative submodules could
> utilize the 'upstream' remote by default for each branch, which would
> provide more predictable default behavior. Right now most people on my
> team would not be aware that if they tracked a branch on their fork,
> they would subsequently need to fork the submodules to that same
> remote.
> 
>>> Various co-workers use the remote named "central" instead of
>>> "upstream" and "fork" instead of "origin" (because that just makes
>>> more sense to them and it's perfectly valid).
>>>
>>> However if relative submodules require 'origin' to exist AND also
>>> represent the upstream repository (in triangle workflow), then this
>>> breaks on several levels.
>>
>> Can you explain further?  In a triangle workflow, "git fetch" will
>> pull from the 'origin' remote by default and will push to the remote
>> named in the '[remote] pushdefault' setting (see "remote.pushdefault"
>> in git-config(1)).  So you can do
>>
>>         [remote]
>>                 pushDefault = whereishouldpush
>>
>> and then 'git fetch' and 'git fetch --recurse-submodules' will fetch
>> from "origin" and 'git push' will push to the whereishouldpush remote.
> 
> I didn't know about this option, seems useful.
> 
> A common workflow that we use on my team is to set the tracking branch
> to 'upstream' for convenient pulls with rebase. This means a feature
> branch of mine can track its parent branch on 'upstream', so that when
> other pull requests get merged in on the remote repo branch, I can
> just do `git pull` and my feature branch rebases onto the latest of
> that parent branch.
> 
> Cases like these would work with relative submodules because
> 'upstream' is the tracked remote (and most of the time we don't want
> to fork submodules). However sometimes I like to track the same pushed
> branch on origin (my fork), especially when it is up for pull request.
> In these cases, my submodule update will fail because I didn't fork my
> submodules when I changed my tracking branch. Is this correct?
> 
> "breaks on several levels" was basically my way of saying that various
> workflow choices will break when you introduce submodules. One of the
> beautiful things about Git is that it allows everyone to choose their
> own workflow. But submodules seem to prevent that to some degree. I
> think addressing relative submodule usability issues is the best
> approach for the long term as they feel more sustainable and scalable.
> It's an absolute pain to move a submodule URL, I think we've all
> experienced it. It's even harder for me because I'm the go-to at work
> for help with Git. Most people that aren't advanced with Git will not
> know what to do without a ton of reading & such.
> 
>> It might make sense to introduce a new
>>
>>         [remote]
>>                 default = whereishouldfetch
>>
>> setting to allow the name "origin" above to be replaced, too.  Is that
>> what you mean?

A couple of years ago I started to work on such a thing ([1] [2] [3]), mainly
because when we tried to change to relative submodules we got bitten when
someone used clone's -o option so that his super-repo had no "origin" remote
*and* his was checked out on a detached HEAD.  So get_default_remote() failed
for him.

I didn't have time to complete the work -- it ended up being quite involved.
 But Junio did come up with an excellent transition plan [4] for adopting a
default remote setting.

[1] (v0) http://thread.gmane.org/gmane.comp.version-control.git/200145
[2] (v1) http://thread.gmane.org/gmane.comp.version-control.git/201065
[3] (v2) http://thread.gmane.org/gmane.comp.version-control.git/201306
[4] http://article.gmane.org/gmane.comp.version-control.git/201332

> I think you're on the right path. However I'd suggest something like
> the following:
> 
> [submodule]
>     remote = <remote_for_relative_submodules> (e.g. `upstream`)

I think remote.default would be more generally useful, especially when
working with detached checkouts.

(For the record, I would also be happy if clone got rid of its -o option and
"origin" became the sacred, reserved remote name (perhaps translated into
other languages as needed) that clone always uses no matter what.)

> [branch.<name>]
>     submoduleRemote = <remote_for_relative_submodule>

If I understand correctly, you want this so that your branch can be a fork of
only the super-repo while the submodules are not forked and so they should
keep tracking their original repo.

To me this seems to be going in the opposite direction of having branches
recursively apply to submodules, which I think most of us want.

A branch should fork the entire repo, including its submodules.  The
implication is that if you want to push that branch somewhere, that somewhere
needs to be able to accept the forks of the submodules *even if those
submodules aren't changed in your branch* because at the very least the
branch ref has to exist in the submodules' repositories.

With absolute-path submodules, the push is a simple as creating the branch
ref in the submodules' "home" repositories -- even if the main "somewhere"
you're pushing to isn't one of those repositories.

With relative-path submodules, the push's target repo *must* also have the
submodules in their proper places, so that they can get updated.
Furthermore, if you clone a repo that has relative-path submodules you *must*
also clone the submodules.

Robert, I think what you'll say to this is that you still want your branch to
track the latest submodules updates from their "home" repository. (BTW, I'm
confused with how you're using the terms "upstream" and "origin".  I'll use
"home" to refer to the repository where everything starts from, and "fork"
for the repository that your branch tracks.)  Well, you get the updates you
want when your branch tracks a ref in the "home" repository.  But when your
branch starts tracking a ref in another "fork" repository then you'll get the
submodule updates in that ref's history from that "fork" repository.

Once your branch is tracking the "fork" repository, if you do a pull you
won't get any submodule updates because the fork's branch hasn't changed.
You need to fetch (recursively) from the "home" repo to get the submodule
updates (assuming one of the "home" repo's branches has updated its
submodules).  Then, with your branch checked out in the super-repo, if you
check out the latest refs in your submodules git will tell you that you have
uncommitted changes in your branch.  The correct way to get submodule updates
into your branch is to commit them.  Even though you're doing a pull/rebase,
there's nothing to rebase onto in the "fork" repository that has the updated
submodules.

		M.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Relative submodule URLs
  2014-08-22 16:00     ` Marc Branchaud
@ 2014-08-24 13:34       ` Heiko Voigt
  2014-08-25 14:29         ` Robert Dailey
  2014-08-25 13:48       ` Robert Dailey
  2014-08-28 17:44       ` Marc Branchaud
  2 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-24 13:34 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Robert Dailey, Jonathan Nieder, Git, Jens Lehmann

Hi,

since the mail got quite long. To avoid 'tl;dr', I talk about two topics
in this mail:

  * Submodule settings for default remote (complex, future)
  * New --with--remote parameter for 'git submodule' (simple, now)

Depending on your interest you might want to skip the first part of the
email.

I think they are two separate topics. Please only answer to either one
and remove the other. That way we split the thread here and not mix the
two together anymore.

On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
> > I think you're on the right path. However I'd suggest something like
> > the following:
> > 
> > [submodule]
> >     remote = <remote_for_relative_submodules> (e.g. `upstream`)
> 
> I think remote.default would be more generally useful, especially when
> working with detached checkouts.

Depends what workflow you have. Especially for submodules where the
default remote might change from branch to branch this is not
necessarily true. The following drawbacks in relation to submodules come
to my mind:

  * You can not transport such configuration to the server. In case
    you are developing on a branch which has changes in a forked
    submodule that would be useful.

  * When your development in superproject and submodule gets merged to a
    stable branch (i.e. master) you also may not want that other remote
    anymore. So a setting, that can be per branch, might be preferred.

  * When your development gets pushed to a different remote the settings
    do not change. I.e. once part of the upstream repository the
    settings should possibly disappear.

  * You might only want to fork a certain submodule (since thats the
    only one you need to make changes in) in your branch. Then you need
    this setting to be per submodule.

So to sum up a default remote setting which would be generally useful
for submodules needs the following properties (IMO):

  * pushable
  * per branch
  * per remote
  * per submodule

All of these being optional, so in case you have a local mirror,
including submodules, of some project in which you develop with your
team you might just want to set the default remote once for all
submodules.

I have not completely thought that through but the special ref idea[3]
described by Jonathan seems to make it possible to implement all these
properties.

> > [branch.<name>]
> >     submoduleRemote = <remote_for_relative_submodule>
> 
> If I understand correctly, you want this so that your branch can be a fork of
> only the super-repo while the submodules are not forked and so they should
> keep tracking their original repo.
> 
> To me this seems to be going in the opposite direction of having branches
> recursively apply to submodules, which I think most of us want.

I disagree. While recursive branches might make sense in some
situations in most it does not. Consider a project in which you use a
library which is separately maintained. You develop on featureA in your
project and discover a bug in the submodule which you fix on a branch
(which is then tracked in the submodule). Here it does not make sense
to call your branch in the submodule featureA, since the submodule has no
knowledge at all (and should not) about this featureA.

While having said that, for a simple workflow while developing a certain
feature recursive branches make sense. Lets say as a temporary local
branch you could have that featureA branch in your submodule and just
commit any changes you need in the submodule on that branch (including
extensions and stuff). Later in the process you divide up that branch in
the submodule into cleanups, bugfixes, extensions, ...  to push it
upstream for review and integration.

> A branch should fork the entire repo, including its submodules.  The
> implication is that if you want to push that branch somewhere, that somewhere
> needs to be able to accept the forks of the submodules *even if those
> submodules aren't changed in your branch* because at the very least the
> branch ref has to exist in the submodules' repositories.

I disagree here as well. As the distributed nature of git allows to have
different remotes, I think its perfectly legitimate to just fork the
repositories you need to change. It should be easy to work on a
repository that is forked in its entirety, but it should also be possible
(and properly supported) to only fork some submodules. I know it does
make the situation more complex, but I think we should properly define
the goal beforehand, so we do not exclude any use-cases. Then we can go
ahead and just implement the simpler stuff (like entire repo forks)
first, while making sure we do not block the more complex use-cases.

> With absolute-path submodules, the push is a simple as creating the branch
> ref in the submodules' "home" repositories -- even if the main "somewhere"
> you're pushing to isn't one of those repositories.
> 
> With relative-path submodules, the push's target repo *must* also have the
> submodules in their proper places, so that they can get updated.
> Furthermore, if you clone a repo that has relative-path submodules you *must*
> also clone the submodules.

That is not true. You can have relative submodules and just clone/fetch
some from a different remote. Its just a question of how to
specifiy/transport this information.

> Robert, I think what you'll say to this is that you still want your branch to
> track the latest submodules updates from their "home" repository. (BTW, I'm
> confused with how you're using the terms "upstream" and "origin".  I'll use
> "home" to refer to the repository where everything starts from, and "fork"
> for the repository that your branch tracks.)  Well, you get the updates you
> want when your branch tracks a ref in the "home" repository.  But when your
> branch starts tracking a ref in another "fork" repository then you'll get the
> submodule updates in that ref's history from that "fork" repository.
> 
> Once your branch is tracking the "fork" repository, if you do a pull you
> won't get any submodule updates because the fork's branch hasn't changed.
> You need to fetch (recursively) from the "home" repo to get the submodule
> updates (assuming one of the "home" repo's branches has updated its
> submodules).  Then, with your branch checked out in the super-repo, if you
> check out the latest refs in your submodules git will tell you that you have
> uncommitted changes in your branch.  The correct way to get submodule updates
> into your branch is to commit them.  Even though you're doing a pull/rebase,
> there's nothing to rebase onto in the "fork" repository that has the updated
> submodules.

Let's please use the terms as Junio described them here[1]. Do not add
confusion by introducing new terms when we do not need them. If we need
new ones lets introduce them properly. Maybe even define them in the
glossary. I feel for some things it is needed when talking about
submodules, so we talk about the same things.

New --with--remote parameter for 'git submodule'
------------------------------------------------

While having said all that about submodule settings I think a much
much simpler start is to go ahead with a commandline setting, like
Robert proposed here[2].

For that we do not have to worry about how it can be stored,
transported, defined per submodule or on a branch, since answers to this
are given at the commandline (and current repository state).

There are still open questions about this though:

  * Should the name in the submodule be 'origin' even though you
    specified --with-remote=somewhere? For me its always confusing to
    have the same/similar remotes named differently in different
    repositories. That why I try to keep the names the same in all my
    clones of repositories (i.e. for my private, github, upstream
    remotes).

  * When you do a 'git submodule sync --with-remote=somewhere' should
    the remote be added or replaced.

My opinion on these are:

The remote should be named as in the superproject so
--with-remote=somewhere adds/replaces the remote 'somewhere' in the
submodules named on the commandline (or all in case no submodule is
specified). In case of a fresh clone of the submodule, there would be no
origin but only a remote under the new name.

Would the --with-remote feature I describe be a feasible start for you
Robert? What do others think? Is the naming of the parameter
'--with-remote' alright?

Cheers Heiko

[1] http://article.gmane.org/gmane.comp.version-control.git/255512
[2] http://article.gmane.org/gmane.comp.version-control.git/255512
[3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Relative submodule URLs
  2014-08-24 13:34       ` Heiko Voigt
@ 2014-08-25 14:29         ` Robert Dailey
  2014-08-25 14:32           ` Robert Dailey
  2014-08-26  6:28           ` Heiko Voigt
  0 siblings, 2 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-25 14:29 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann

On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> New --with--remote parameter for 'git submodule'
> ------------------------------------------------
>
> While having said all that about submodule settings I think a much
> much simpler start is to go ahead with a commandline setting, like
> Robert proposed here[2].
>
> For that we do not have to worry about how it can be stored,
> transported, defined per submodule or on a branch, since answers to this
> are given at the commandline (and current repository state).
>
> There are still open questions about this though:
>
>   * Should the name in the submodule be 'origin' even though you
>     specified --with-remote=somewhere? For me its always confusing to
>     have the same/similar remotes named differently in different
>     repositories. That why I try to keep the names the same in all my
>     clones of repositories (i.e. for my private, github, upstream
>     remotes).
>
>   * When you do a 'git submodule sync --with-remote=somewhere' should
>     the remote be added or replaced.
>
> My opinion on these are:
>
> The remote should be named as in the superproject so
> --with-remote=somewhere adds/replaces the remote 'somewhere' in the
> submodules named on the commandline (or all in case no submodule is
> specified). In case of a fresh clone of the submodule, there would be no
> origin but only a remote under the new name.
>
> Would the --with-remote feature I describe be a feasible start for you
> Robert? What do others think? Is the naming of the parameter
> '--with-remote' alright?
>
> Cheers Heiko
>
> [1] http://article.gmane.org/gmane.comp.version-control.git/255512
> [2] http://article.gmane.org/gmane.comp.version-control.git/255512
> [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values

Hi Heiko,

My last email response was in violation of your request to keep the
two topics separate, sorry about that. I started typing it this
weekend and completed the draft this morning, without having read this
response from you first. At this point my only intention was to start
discussion on a possible short-term solution. I realize the Git
developers are working hard on improving submodule workflow for the
long term. In addition I do not have the domain expertise to properly
make suggestions in regards to longer-term solutions, so I leave that
to you :-)

The --with-remote feature would allow me to begin using relative
submodules because:

On a per-submodule basis, I can specify the remote it will use. When I
fork a submodule and need to start tracking it, I can run `git
submodule sync --with-remote fork`, which will take my super repo's
'fork' remote, REPLACE 'origin' in the submodule with that URL, and
also redo the relative URL calculation. This is ideal since I use HTTP
at home (so I can use my proxy server to access git behind firewall at
work) and at work physically I use SSH for performance (to avoid HTTP
protocol). I also like the idea of "never" having to update my
submodule URLs again if the git server moves, domain name changes, or
whatever else.

Here is what I think would make the feature most usable. I think you
went over some of these ideas but I just want to clarify, to make sure
we're on the same page. Please correct me as needed.

1. Running `git submodule update --with-remote <name>` shall fail the
command unconditionally.
2. Using the `--with-remote` option on submodule `update` or `sync`
will fail if it detects absolute submodule URLs in .gitmodule
3. Running `git submodule update --init --with-remote <name>` shall
fail the command ONLY if a submodule is being processed that is NOT
also being initialized.
4. The behavior of git submodule's `update` or `sync` commands
combined with `--with-remote` will REPLACE or CREATE the 'origin'
remote in each submodule it is run in. We will not allow the user to
configure what the submodule remote name will end up being (I think
this is current behavior and forces good practice; I consider `origin`
an adopted standard for git, and actually wish it was more enforced
for super projects as well!)

Let me know if I've missed anything. Once we clarify requirements I'll
attempt to start work on this during my free time. I'll start by
testing this through msysgit, since I do not have linux installed, but
I have Linux Mint running in a Virtual Machine so I can test on both
platforms as needed (I don't have a lot of experience on Linux
though).

I hope you won't mind me reaching out for questions as needed, however
I will attempt to be as resourceful as possible since I know you're
all busy. Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Relative submodule URLs
  2014-08-25 14:29         ` Robert Dailey
@ 2014-08-25 14:32           ` Robert Dailey
  2014-08-26  6:28           ` Heiko Voigt
  1 sibling, 0 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-25 14:32 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann

On Mon, Aug 25, 2014 at 9:29 AM, Robert Dailey <rcdailey.lists@gmail.com> wrote:
> On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
>> New --with--remote parameter for 'git submodule'
>> ------------------------------------------------
>>
>> While having said all that about submodule settings I think a much
>> much simpler start is to go ahead with a commandline setting, like
>> Robert proposed here[2].
>>
>> For that we do not have to worry about how it can be stored,
>> transported, defined per submodule or on a branch, since answers to this
>> are given at the commandline (and current repository state).
>>
>> There are still open questions about this though:
>>
>>   * Should the name in the submodule be 'origin' even though you
>>     specified --with-remote=somewhere? For me its always confusing to
>>     have the same/similar remotes named differently in different
>>     repositories. That why I try to keep the names the same in all my
>>     clones of repositories (i.e. for my private, github, upstream
>>     remotes).
>>
>>   * When you do a 'git submodule sync --with-remote=somewhere' should
>>     the remote be added or replaced.
>>
>> My opinion on these are:
>>
>> The remote should be named as in the superproject so
>> --with-remote=somewhere adds/replaces the remote 'somewhere' in the
>> submodules named on the commandline (or all in case no submodule is
>> specified). In case of a fresh clone of the submodule, there would be no
>> origin but only a remote under the new name.
>>
>> Would the --with-remote feature I describe be a feasible start for you
>> Robert? What do others think? Is the naming of the parameter
>> '--with-remote' alright?
>>
>> Cheers Heiko
>>
>> [1] http://article.gmane.org/gmane.comp.version-control.git/255512
>> [2] http://article.gmane.org/gmane.comp.version-control.git/255512
>> [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values
>
> Hi Heiko,
>
> My last email response was in violation of your request to keep the
> two topics separate, sorry about that. I started typing it this
> weekend and completed the draft this morning, without having read this
> response from you first. At this point my only intention was to start
> discussion on a possible short-term solution. I realize the Git
> developers are working hard on improving submodule workflow for the
> long term. In addition I do not have the domain expertise to properly
> make suggestions in regards to longer-term solutions, so I leave that
> to you :-)
>
> The --with-remote feature would allow me to begin using relative
> submodules because:
>
> On a per-submodule basis, I can specify the remote it will use. When I
> fork a submodule and need to start tracking it, I can run `git
> submodule sync --with-remote fork`, which will take my super repo's
> 'fork' remote, REPLACE 'origin' in the submodule with that URL, and
> also redo the relative URL calculation. This is ideal since I use HTTP
> at home (so I can use my proxy server to access git behind firewall at
> work) and at work physically I use SSH for performance (to avoid HTTP
> protocol). I also like the idea of "never" having to update my
> submodule URLs again if the git server moves, domain name changes, or
> whatever else.
>
> Here is what I think would make the feature most usable. I think you
> went over some of these ideas but I just want to clarify, to make sure
> we're on the same page. Please correct me as needed.
>
> 1. Running `git submodule update --with-remote <name>` shall fail the
> command unconditionally.
> 2. Using the `--with-remote` option on submodule `update` or `sync`
> will fail if it detects absolute submodule URLs in .gitmodule
> 3. Running `git submodule update --init --with-remote <name>` shall
> fail the command ONLY if a submodule is being processed that is NOT
> also being initialized.
> 4. The behavior of git submodule's `update` or `sync` commands
> combined with `--with-remote` will REPLACE or CREATE the 'origin'
> remote in each submodule it is run in. We will not allow the user to
> configure what the submodule remote name will end up being (I think
> this is current behavior and forces good practice; I consider `origin`
> an adopted standard for git, and actually wish it was more enforced
> for super projects as well!)
>
> Let me know if I've missed anything. Once we clarify requirements I'll
> attempt to start work on this during my free time. I'll start by
> testing this through msysgit, since I do not have linux installed, but
> I have Linux Mint running in a Virtual Machine so I can test on both
> platforms as needed (I don't have a lot of experience on Linux
> though).
>
> I hope you won't mind me reaching out for questions as needed, however
> I will attempt to be as resourceful as possible since I know you're
> all busy. Thanks.

Thought of a few more:

5. If `--with-remote` is unspecified, behavior will continue as it
currently does (I'm not clear on the precedence here of various
options, but I assume: `remote.default` first, then
`branch.name.remote`)
6. `--with-remote` will take precedence over `remote.default` and
`branch.name.remote`.

I'll add more as I think of them... Sorry for the spam.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: Relative submodule URLs
  2014-08-25 14:29         ` Robert Dailey
  2014-08-25 14:32           ` Robert Dailey
@ 2014-08-26  6:28           ` Heiko Voigt
  2014-08-26 15:18             ` Robert Dailey
  1 sibling, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-26  6:28 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann

On Mon, Aug 25, 2014 at 09:29:07AM -0500, Robert Dailey wrote:
> On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > New --with--remote parameter for 'git submodule'
> > ------------------------------------------------
> >
> > While having said all that about submodule settings I think a much
> > much simpler start is to go ahead with a commandline setting, like
> > Robert proposed here[2].
> >
> > For that we do not have to worry about how it can be stored,
> > transported, defined per submodule or on a branch, since answers to this
> > are given at the commandline (and current repository state).
> >
> > There are still open questions about this though:
> >
> >   * Should the name in the submodule be 'origin' even though you
> >     specified --with-remote=somewhere? For me its always confusing to
> >     have the same/similar remotes named differently in different
> >     repositories. That why I try to keep the names the same in all my
> >     clones of repositories (i.e. for my private, github, upstream
> >     remotes).
> >
> >   * When you do a 'git submodule sync --with-remote=somewhere' should
> >     the remote be added or replaced.
> >
> > My opinion on these are:
> >
> > The remote should be named as in the superproject so
> > --with-remote=somewhere adds/replaces the remote 'somewhere' in the
> > submodules named on the commandline (or all in case no submodule is
> > specified). In case of a fresh clone of the submodule, there would be no
> > origin but only a remote under the new name.
> >
> > Would the --with-remote feature I describe be a feasible start for you
> > Robert? What do others think? Is the naming of the parameter
> > '--with-remote' alright?
> >
> > Cheers Heiko
> >
> > [1] http://article.gmane.org/gmane.comp.version-control.git/255512
> > [2] http://article.gmane.org/gmane.comp.version-control.git/255512
> > [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values
> 
> Hi Heiko,
> 
> My last email response was in violation of your request to keep the
> two topics separate, sorry about that. I started typing it this
> weekend and completed the draft this morning, without having read this
> response from you first.

Thats fine, no problem.

> Here is what I think would make the feature most usable. I think you
> went over some of these ideas but I just want to clarify, to make sure
> we're on the same page. Please correct me as needed.
> 
> 1. Running `git submodule update --with-remote <name>` shall fail the
> command unconditionally.

I am not sure but I think you mean

	git submodule update --with-remote=<name>

With the equals sign, without it you would name the submodule paths to
update. No I think that should just add the remote <name> to all
submodules that would be updated and do the normal update operation on
them (with the new remote of course).

> 2. Using the `--with-remote` option on submodule `update` or `sync`
> will fail if it detects absolute submodule URLs in .gitmodule

Yes, almost. Since you can have a mixture I suggest to only fail if the
submodules that would be processed have an absolute url in them. If
processed submodules are all relative it can go ahead.

> 3. Running `git submodule update --init --with-remote <name>` shall
> fail the command ONLY if a submodule is being processed that is NOT
> also being initialized.

No since the --init flag just tells update to initialize submodules
on-demand. It should just go ahead the same way as without
--with-remote.

> 4. The behavior of git submodule's `update` or `sync` commands
> combined with `--with-remote` will REPLACE or CREATE the 'origin'
> remote in each submodule it is run in. We will not allow the user to
> configure what the submodule remote name will end up being (I think
> this is current behavior and forces good practice; I consider `origin`
> an adopted standard for git, and actually wish it was more enforced
> for super projects as well!)

No please carefully read my email again. I specifically was describing
the opposite. --with-remote=<name> creates/replaces the remote <name> in
the submodule. I do not see a benefit in restricting the user from
creating different remote names in the submodule. I think it would be
more confusing if the remote 'origin' in the superproject does not point
to the same location as 'origin' in the submodule.

> Let me know if I've missed anything. Once we clarify requirements I'll
> attempt to start work on this during my free time. I'll start by
> testing this through msysgit, since I do not have linux installed, but
> I have Linux Mint running in a Virtual Machine so I can test on both
> platforms as needed (I don't have a lot of experience on Linux
> though).

I think it does not matter which development environment you use. In my
experience though Linux is around 30x faster when it comes to the
typical operations you do when developing git. Especially for running
the testsuite that makes a difference between a few hours and minutes.

> I hope you won't mind me reaching out for questions as needed, however
> I will attempt to be as resourceful as possible since I know you're
> all busy. Thanks.

No problem, just post here and we will see.


On Mon, Aug 25, 2014 at 09:32:27AM -0500, Robert Dailey wrote:
> Thought of a few more:
> 
> 5. If `--with-remote` is unspecified, behavior will continue as it
> currently does (I'm not clear on the precedence here of various
> options, but I assume: `remote.default` first, then
> `branch.name.remote`)

Yes. And I hope that is ensured enough through the testsuite for this
case. So run it to ensure this. Have a look what kind of tests exist and
maybe even write one or two for the code you change. Thats a good start
for practise and also makes sure you do no break existing behavior.

Johan Herland also recently collected some update tests here[1]

AFAIK, remote.default was WIP and does not exist yet. So you only need
to worry 

> 6. `--with-remote` will take precedence over `remote.default` and
> `branch.name.remote`.

Yes.

> I'll add more as I think of them... Sorry for the spam.

I think the code for the new commandline switch will not be too
complicated/big so I think its best if you just go ahead, write it and
then send a patch to the list once you are happy. Its common to add a
RFC if you just want some comments on your current status and do not
think its ready for inclusion yet. Expect it to go a few rounds until
everything is ironed out.

Cheers Heiko

[1] http://thread.gmane.org/gmane.comp.version-control.git/246312

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: Relative submodule URLs
  2014-08-26  6:28           ` Heiko Voigt
@ 2014-08-26 15:18             ` Robert Dailey
  2014-08-26 20:34               ` Heiko Voigt
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Dailey @ 2014-08-26 15:18 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 26, 2014 at 1:28 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
>> Hi Heiko,
>>
>> My last email response was in violation of your request to keep the
>> two topics separate, sorry about that. I started typing it this
>> weekend and completed the draft this morning, without having read this
>> response from you first.
>
> Thats fine, no problem.
>
>> Here is what I think would make the feature most usable. I think you
>> went over some of these ideas but I just want to clarify, to make sure
>> we're on the same page. Please correct me as needed.
>>
>> 1. Running `git submodule update --with-remote <name>` shall fail the
>> command unconditionally.
>
> I am not sure but I think you mean
>
>         git submodule update --with-remote=<name>
>
> With the equals sign, without it you would name the submodule paths to
> update. No I think that should just add the remote <name> to all
> submodules that would be updated and do the normal update operation on
> them (with the new remote of course).

I'm not sure about Linux but at least with msysgit on Windows, typing
a two-dash option (such as --with-remote) forces command line
evaluation to use the next placement parameter as the parameter for
it. I've seen this work the same way with argparse in python too. In
my experience, command line has worked that way, I'm not sure if that
is by design or not though. I never use equal signs with git commands,
never had a problem for some reason.

For example:

git rebase --onto release/1.0 head~3 head

The `--onto` option knows to use `release/1.0` as its parameter.

>> 2. Using the `--with-remote` option on submodule `update` or `sync`
>> will fail if it detects absolute submodule URLs in .gitmodule
>
> Yes, almost. Since you can have a mixture I suggest to only fail if the
> submodules that would be processed have an absolute url in them. If
> processed submodules are all relative it can go ahead.

For example if it processes 3 submodules in the following order:

1. relative
2. absolute
3. relative

Should it fail before or after processing the 3rd relative submodule?
I was thinking it would fail while trying to sync/update the 2nd one
(which is absolute) and stop before processing the 3rd.

>> 3. Running `git submodule update --init --with-remote <name>` shall
>> fail the command ONLY if a submodule is being processed that is NOT
>> also being initialized.
>
> No since the --init flag just tells update to initialize submodules
> on-demand. It should just go ahead the same way as without
> --with-remote.

But doesn't the on-demand initialization need to evaluate relative
URLs and convert them to absolute based on the .gitmodules
configuration? I thought the idea was to make `--with-remote` invalid
for initialization/sync of absolute URLs.

In other words if I did:

git submodule init --with-remote fork my-submodule-dir

and if my-submodule-dir was not relative in .gitmodules, then the
`--with-remote` flag becomes useless. We could fail silently but for
educational purposes to the user I thought we were failing in these
scenarios. Maybe I misunderstood your original intent with the
failures? Is init not doing the relative to absolute evaluation like
I'm thinking? Please correct me where I'm wrong.

>> 4. The behavior of git submodule's `update` or `sync` commands
>> combined with `--with-remote` will REPLACE or CREATE the 'origin'
>> remote in each submodule it is run in. We will not allow the user to
>> configure what the submodule remote name will end up being (I think
>> this is current behavior and forces good practice; I consider `origin`
>> an adopted standard for git, and actually wish it was more enforced
>> for super projects as well!)
>
> No please carefully read my email again. I specifically was describing
> the opposite. --with-remote=<name> creates/replaces the remote <name> in
> the submodule. I do not see a benefit in restricting the user from
> creating different remote names in the submodule. I think it would be
> more confusing if the remote 'origin' in the superproject does not point
> to the same location as 'origin' in the submodule.

Well the reason why I said it would be 'origin' is so that the
submodule knows which remote to use internally during an update. I'm
assuming 'update' uses 'origin' internally in the submodule to know
which remote to pull from. My understanding of how `git submodule
update` knows which URL to pull from is probably incorrect. I'm not
familiar on the internal mechanics of how this works. Perhaps you
could explain or send me to some reading material on it?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: Re: Relative submodule URLs
  2014-08-26 15:18             ` Robert Dailey
@ 2014-08-26 20:34               ` Heiko Voigt
  0 siblings, 0 replies; 25+ messages in thread
From: Heiko Voigt @ 2014-08-26 20:34 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Marc Branchaud, Jonathan Nieder, Git, Jens Lehmann

On Tue, Aug 26, 2014 at 10:18:48AM -0500, Robert Dailey wrote:
> On Tue, Aug 26, 2014 at 1:28 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> >> My last email response was in violation of your request to keep the
> >> two topics separate, sorry about that. I started typing it this
> >> weekend and completed the draft this morning, without having read this
> >> response from you first.
> >
> > Thats fine, no problem.
> >
> >> Here is what I think would make the feature most usable. I think you
> >> went over some of these ideas but I just want to clarify, to make sure
> >> we're on the same page. Please correct me as needed.
> >>
> >> 1. Running `git submodule update --with-remote <name>` shall fail the
> >> command unconditionally.
> >
> > I am not sure but I think you mean
> >
> >         git submodule update --with-remote=<name>
> >
> > With the equals sign, without it you would name the submodule paths to
> > update. No I think that should just add the remote <name> to all
> > submodules that would be updated and do the normal update operation on
> > them (with the new remote of course).
> 
> I'm not sure about Linux but at least with msysgit on Windows, typing
> a two-dash option (such as --with-remote) forces command line
> evaluation to use the next placement parameter as the parameter for
> it. I've seen this work the same way with argparse in python too. In
> my experience, command line has worked that way, I'm not sure if that
> is by design or not though. I never use equal signs with git commands,
> never had a problem for some reason.
>
> For example:
> 
> git rebase --onto release/1.0 head~3 head
> 
> The `--onto` option knows to use `release/1.0` as its parameter.

If you are on Window or Linux does not make a difference here. I just
realized we are quite inconsistent:

$ git grep -E -e "--\w+=<\w+>" -- Documentation/ | wc -l
     226

$ git grep -E -e "--\w+ <\w+>" -- Documentation/ | wc -l
      75

I would prefer the first though since that one is used more often. But
we can leave that for later, once we have some code to talk about.

> >> 2. Using the `--with-remote` option on submodule `update` or `sync`
> >> will fail if it detects absolute submodule URLs in .gitmodule
> >
> > Yes, almost. Since you can have a mixture I suggest to only fail if the
> > submodules that would be processed have an absolute url in them. If
> > processed submodules are all relative it can go ahead.
> 
> For example if it processes 3 submodules in the following order:
> 
> 1. relative
> 2. absolute
> 3. relative
> 
> Should it fail before or after processing the 3rd relative submodule?
> I was thinking it would fail while trying to sync/update the 2nd one
> (which is absolute) and stop before processing the 3rd.

For consistency I would prefer if it fails right from the beginning in
this situation since the command can not be completed.

> >> 3. Running `git submodule update --init --with-remote <name>` shall
> >> fail the command ONLY if a submodule is being processed that is NOT
> >> also being initialized.
> >
> > No since the --init flag just tells update to initialize submodules
> > on-demand. It should just go ahead the same way as without
> > --with-remote.
> 
> But doesn't the on-demand initialization need to evaluate relative
> URLs and convert them to absolute based on the .gitmodules
> configuration? I thought the idea was to make `--with-remote` invalid
> for initialization/sync of absolute URLs.
> 
> In other words if I did:
> 
> git submodule init --with-remote fork my-submodule-dir
> 
> and if my-submodule-dir was not relative in .gitmodules, then the
> `--with-remote` flag becomes useless. We could fail silently but for
> educational purposes to the user I thought we were failing in these
> scenarios. Maybe I misunderstood your original intent with the
> failures? Is init not doing the relative to absolute evaluation like
> I'm thinking? Please correct me where I'm wrong.

Yes it does the relative to absolute evaluation. But that is a different
topic. For absolute urls in .gitmodules it should fail, but you were
talking about --init in general and in general that should not fail IMO.
So e.g.

	git submodule update --init --with-remote=<name>

when all submodule urls are relative in .gitmodules and some submodules
have already been initialized should succeed.

> >> 4. The behavior of git submodule's `update` or `sync` commands
> >> combined with `--with-remote` will REPLACE or CREATE the 'origin'
> >> remote in each submodule it is run in. We will not allow the user to
> >> configure what the submodule remote name will end up being (I think
> >> this is current behavior and forces good practice; I consider `origin`
> >> an adopted standard for git, and actually wish it was more enforced
> >> for super projects as well!)
> >
> > No please carefully read my email again. I specifically was describing
> > the opposite. --with-remote=<name> creates/replaces the remote <name> in
> > the submodule. I do not see a benefit in restricting the user from
> > creating different remote names in the submodule. I think it would be
> > more confusing if the remote 'origin' in the superproject does not point
> > to the same location as 'origin' in the submodule.
> 
> Well the reason why I said it would be 'origin' is so that the
> submodule knows which remote to use internally during an update. I'm
> assuming 'update' uses 'origin' internally in the submodule to know
> which remote to pull from. My understanding of how `git submodule
> update` knows which URL to pull from is probably incorrect. I'm not
> familiar on the internal mechanics of how this works. Perhaps you
> could explain or send me to some reading material on it?

Yes your assumptions are almost true. Except that submodule update does
not do a pull but a fetch (without any arguments) by default. But your
implementation could change (I actually first thought that was already
the case) the fetch in submodule update to fetch from all remotes before
updating if there is no remote specified. But I have not thought that
through. Additionally the implementation of --with-remote could be used
to specify from which remote to fetch.  Regarding the reading I can only
suggest the code of git-submodule.sh to you.

I can understand that altering the 'origin' remote to also save the
remote for future fetches would help you in your case but we have to
keep other workflow in mind and not all people (me included) want only
one remote in their submodule. --with-remote should also help those
people which it does when it adds a remote under that name. Changing the
default from which a submodule is fetched by submodule update is a
separate topic for the additional configuration which we split from this
topic. I think.

Cheers Heiko

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-22 16:00     ` Marc Branchaud
  2014-08-24 13:34       ` Heiko Voigt
@ 2014-08-25 13:48       ` Robert Dailey
  2014-08-28 17:44       ` Marc Branchaud
  2 siblings, 0 replies; 25+ messages in thread
From: Robert Dailey @ 2014-08-25 13:48 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Jonathan Nieder, Git, Jens Lehmann, Heiko Voigt

On Fri, Aug 22, 2014 at 11:00 AM, Marc Branchaud <marcnarc@xiplink.com> wrote:
> A couple of years ago I started to work on such a thing ([1] [2] [3]), mainly
> because when we tried to change to relative submodules we got bitten when
> someone used clone's -o option so that his super-repo had no "origin" remote
> *and* his was checked out on a detached HEAD.  So get_default_remote() failed
> for him.
>
> I didn't have time to complete the work -- it ended up being quite involved.
>  But Junio did come up with an excellent transition plan [4] for adopting a
> default remote setting.
>
> [1] (v0) http://thread.gmane.org/gmane.comp.version-control.git/200145
> [2] (v1) http://thread.gmane.org/gmane.comp.version-control.git/201065
> [3] (v2) http://thread.gmane.org/gmane.comp.version-control.git/201306
> [4] http://article.gmane.org/gmane.comp.version-control.git/201332
>
>> I think you're on the right path. However I'd suggest something like
>> the following:
>>
>> [submodule]
>>     remote = <remote_for_relative_submodules> (e.g. `upstream`)
>
> I think remote.default would be more generally useful, especially when
> working with detached checkouts.

Honestly speaking I don't use default.remote, even now that I know
about it thanks to the discussion ongoing here. The reason is that
sometimes I push my branches to origin, sometimes I push them to my
fork. I like explicit control as to which one I push to. I also sync
my git config file to dropbox and I use it on multiple projects and
platforms. I don't use the same push destination workflow on all
projects. It seems to get in the way of my workflow more than it
helps. I really only ever have two needs:

1. Push explicitly to my remote (e.g. `git push fork` or `git push origin`)
2. Push to the tracked branch (e.g. `git push`)

I'm also not sure how `push.default = simple` conflicts with the usage
of `remote.default`, since in the tracked-repo case, you must
explicitly specify the source ref to push. Is this behavior documented
somewhere?

> (For the record, I would also be happy if clone got rid of its -o option and
> "origin" became the sacred, reserved remote name (perhaps translated into
> other languages as needed) that clone always uses no matter what.)
>
>> [branch.<name>]
>>     submoduleRemote = <remote_for_relative_submodule>
>
> If I understand correctly, you want this so that your branch can be a fork of
> only the super-repo while the submodules are not forked and so they should
> keep tracking their original repo.

That's correct. But this is case-by-case. Sometimes I make a change
where I want the submodule forked (rare), most times I don't.
Sometimes I can get away with pushing changes to the submodule and
worrying about it later since I know the submodule ref won't move
forward unless someone does update --remote (which isn't often or only
done as needed).

> To me this seems to be going in the opposite direction of having branches
> recursively apply to submodules, which I think most of us want.
>
> A branch should fork the entire repo, including its submodules.  The
> implication is that if you want to push that branch somewhere, that somewhere
> needs to be able to accept the forks of the submodules *even if those
> submodules aren't changed in your branch* because at the very least the
> branch ref has to exist in the submodules' repositories.

There are many levels on which this can apply. When it comes to
checkouts and such, I agree. However, how will this impact *creating*
branches? What about forking? Do you expect submodule forking &
branching to be automatic as well? Based on your description, it seems
so (although a new branch doesn't necessarily have to correspond to a
new fork, unless I'm misunderstanding you). This seems difficult to
do, especially the forking part since you would need an API for this
(Github, Atlassian Stash, etc), unless you are thinking of something
clever like local/relative forks.

However the inconvenience of forking manually isn't the main reason
why I avoid forking submodules. It's the complication of pull
requests. There is no uniformity there, which is unfortunate.
Recursive pull requests are something outside the scope of git, I
realize that, but it would still be nice. However the suggestion you
make here lays the foundation for that I think.

> With absolute-path submodules, the push is a simple as creating the branch
> ref in the submodules' "home" repositories -- even if the main "somewhere"
> you're pushing to isn't one of those repositories.
>
> With relative-path submodules, the push's target repo *must* also have the
> submodules in their proper places, so that they can get updated.
> Furthermore, if you clone a repo that has relative-path submodules you *must*
> also clone the submodules.
>
> Robert, I think what you'll say to this is that you still want your branch to
> track the latest submodules updates from their "home" repository. (BTW, I'm
> confused with how you're using the terms "upstream" and "origin".  I'll use
> "home" to refer to the repository where everything starts from, and "fork"
> for the repository that your branch tracks.)  Well, you get the updates you
> want when your branch tracks a ref in the "home" repository.  But when your
> branch starts tracking a ref in another "fork" repository then you'll get the
> submodule updates in that ref's history from that "fork" repository.

My usage of 'upstream' and 'origin' were wrong. I don't use upstream
anymore, based on the explanations I've received here. I use the
following now:

origin = my central repository (authoritative)
fork = My fork of the central repo

I like your idea of forking/branching on submodules being recursive
based on the super repo, but I just don't see how this is possible.
How would git tell github to fork, for example? And would that also
work on Stash?

> Once your branch is tracking the "fork" repository, if you do a pull you
> won't get any submodule updates because the fork's branch hasn't changed.
> You need to fetch (recursively) from the "home" repo to get the submodule
> updates (assuming one of the "home" repo's branches has updated its
> submodules).  Then, with your branch checked out in the super-repo, if you
> check out the latest refs in your submodules git will tell you that you have
> uncommitted changes in your branch.  The correct way to get submodule updates
> into your branch is to commit them.  Even though you're doing a pull/rebase,
> there's nothing to rebase onto in the "fork" repository that has the updated
> submodules.

I like your ideas, assuming they are technically possible. They sound
like great solutions for the long term. However for now, the whole
process of working with remotes is very confusing. At first it was
complicated when it came to triangle workflow. Mostly because the way
you set push.default changes completely between the two, especially
when combined with various workflows.

Add on top of that the complexity of workflows for submodules, and it
becomes a complete mess. Maybe for you guys who actively develop and
understand git's internals it isn't so bad. However I don't have that
domain knowledge, so I only have a "user" perspective on the matter.
`remote.default` sounds nice but how do I use it based on my response
in my first paragraph above?

Thanks guys, this is great discussion.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-22 16:00     ` Marc Branchaud
  2014-08-24 13:34       ` Heiko Voigt
  2014-08-25 13:48       ` Robert Dailey
@ 2014-08-28 17:44       ` Marc Branchaud
  2014-08-28 19:35         ` Heiko Voigt
  2 siblings, 1 reply; 25+ messages in thread
From: Marc Branchaud @ 2014-08-28 17:44 UTC (permalink / raw)
  To: Robert Dailey, Jonathan Nieder; +Cc: Git, Jens Lehmann, Heiko Voigt

Sorry for dropping out of the conversation; the last few days were a bit hectic.

Regarding recursive branching, I agree that a super-repo's branch names are
not necessarily appropriate for its submodules, and that Heiko's "simple
workflow" is a workable base to build upon.  More thought is needed here, but
that's for another day.

Regarding remote.default, Robert please understand that the feature doesn't
exist, and the idea is to only serve as a fallback when the current methods
for remote selection end up resorting to the hardcoded "origin" name.  More
thought is also needed here, but not today.

Both Heiko and Robert took issue with this statement of mine:

On 14-08-22 12:00 PM, Marc Branchaud wrote:
> A branch should fork the entire repo, including its submodules.  The
> implication is that if you want to push that branch somewhere, that
> somewhere needs to be able to accept the forks of the submodules *even
> if those submodules aren't changed in your branch* because at the very
> least the branch ref has to exist in the submodules' repositories.

Heiko said: "It should be easy to work on a repository that is forked in its
entirety, but it should also be possible (and properly supported) to only
fork some submodules."

You're right, I overstated it when I said that the branch ref has to exist in
the unchanged submodules.  The super-repo branch records which submodules it
updates, and when pushing the branch somewhere only those submodules' changes
need to be pushed.

Robert asked: "How will this impact *creating* branches? What about forking?
Do you expect submodule forking & branching to be automatic as well? ... This
seems difficult to do, especially the forking part since you would need an
API for this (Github, Atlassian Stash, etc), unless you are thinking of
something clever like local/relative forks."

I meant "fork" in the local-branch sense:  The branch represents a topic in
the repository, and it should encompass the entire repository including its
submodules (just like the branch encompasses all the files in the repository,
even though the branch's commits only change a subset of those files).  I
think you're talking about "fork" in the sense of setting up a mirror of a
repository.  I agree that there aren't really any tools for automatically
doing that with repositories that contain relative-path submodules.  I think
"git clone" could learn to do it, though.

Heiko also said this:
> On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
>> With relative-path submodules, the push's target repo *must* also have the
>> submodules in their proper places, so that they can get updated.
>> Furthermore, if you clone a repo that has relative-path submodules you
>> *must* also clone the submodules.
>
> That is not true. You can have relative submodules and just clone/fetch
> some from a different remote. Its just a question of how to
> specifiy/transport this information.

I meant that more as a general guideline than some kind of physical law.
Sure, it's possible to scatter the submodules across all sorts of hosts, but
it's not a good idea.  When it comes to relative-path submodules, pushing and
fetching submodule changes in the super-repo should just involve the one
remote host (whatever way that's determined).  This keeps things tractable,
because otherwise your branch's changes are scattered among many different
hosts and you end up considering weird things like "this part of the branch's
changes are on host A but this other part are on host B, so let's record that
somewhere, oh but what if host B is down when I'm trying to fetch, but I know
that host C has the changes too so why don't I just fetch what I want from
there".

It's a nightmare.  It's infinitely better to treat a repository and its
relative-path submodules as an atomic unit, so that any remote that hosts the
repository also hosts the submodules.  When pushing a branch with submodule
changes, expect to find those submodules on the target remote and update
them.  Regardless of how the target remote is determined.  Same thing for
fetching.  It's just so much simpler to work this way.

So please, let's not try to specify submodule remotes per-branch or make that
info pushable.  It's enough for a branch's local configuration to say that it
tracks fetch/pull refs on different remotes.  The rest should flow from that.

		M.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Relative submodule URLs
  2014-08-28 17:44       ` Marc Branchaud
@ 2014-08-28 19:35         ` Heiko Voigt
  2014-08-29 15:09           ` Marc Branchaud
  0 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2014-08-28 19:35 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Robert Dailey, Jonathan Nieder, Git, Jens Lehmann

On Thu, Aug 28, 2014 at 01:44:18PM -0400, Marc Branchaud wrote:
> Heiko also said this:
> > On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
> >> With relative-path submodules, the push's target repo *must* also have the
> >> submodules in their proper places, so that they can get updated.
> >> Furthermore, if you clone a repo that has relative-path submodules you
> >> *must* also clone the submodules.
> >
> > That is not true. You can have relative submodules and just clone/fetch
> > some from a different remote. Its just a question of how to
> > specifiy/transport this information.
> 
> I meant that more as a general guideline than some kind of physical law.
> Sure, it's possible to scatter the submodules across all sorts of hosts, but
> it's not a good idea.  When it comes to relative-path submodules, pushing and
> fetching submodule changes in the super-repo should just involve the one
> remote host (whatever way that's determined).  This keeps things tractable,
> because otherwise your branch's changes are scattered among many different
> hosts and you end up considering weird things like "this part of the branch's
> changes are on host A but this other part are on host B, so let's record that
> somewhere, oh but what if host B is down when I'm trying to fetch, but I know
> that host C has the changes too so why don't I just fetch what I want from
> there".
> 
> It's a nightmare.  It's infinitely better to treat a repository and its
> relative-path submodules as an atomic unit, so that any remote that hosts the
> repository also hosts the submodules.  When pushing a branch with submodule
> changes, expect to find those submodules on the target remote and update
> them.  Regardless of how the target remote is determined.  Same thing for
> fetching.  It's just so much simpler to work this way.

You are right, its simpler. But I would not say "better". Depending on
your project it might be "better" to just fork some submodules.

> So please, let's not try to specify submodule remotes per-branch or make that
> info pushable.  It's enough for a branch's local configuration to say that it
> tracks fetch/pull refs on different remotes.  The rest should flow from that.

Why not? Git is all about flexibility. Of course if you organise your
submodules in chaos you will get chaos. But consider this:

You have this big project which consists of submodule (e.g. like Android
with hundreds of submodules). Now you want to develop on something that
involves just a subset of submodules, lets say two submodules.

Now if someone just wants to publish a small change to some submodules
you are demanding to setup a mirror of *all* submodules that are in this
big project. That might not even be feasible depending on the projects
size and the remote quota. Not to speak about having to first create a
fork of hundreds of repositories. So in this situation we should support
just referring some submodules to other places.

Regarding transporting this information. If you ask someone to try out
your change it should be as simple as possible. It should be enough to
say. clone from there and checkout that branch (once recursive checkout
and fetch for submodules is in place). So here we need a way to
transport this configuration for a fork.

Yes for a small project where its feasible to simply clone all
submodules you can just say: please fork everything. But for bigger
projects thats not necessarily an option. So we should at least give the
users that option. Then its a matter of policy how you work with a
project.

I am not saying that everything for this should be implemented in the
first steps but we should keep it in mind and design everything in such
a way that it is still possible to implement such a kind of workflow
later.

Cheers Heiko

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Relative submodule URLs
  2014-08-28 19:35         ` Heiko Voigt
@ 2014-08-29 15:09           ` Marc Branchaud
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Branchaud @ 2014-08-29 15:09 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Robert Dailey, Jonathan Nieder, Git, Jens Lehmann

On 14-08-28 03:35 PM, Heiko Voigt wrote:
> On Thu, Aug 28, 2014 at 01:44:18PM -0400, Marc Branchaud wrote:
>> Heiko also said this:
>>> On Fri, Aug 22, 2014 at 12:00:07PM -0400, Marc Branchaud wrote:
>>>> With relative-path submodules, the push's target repo *must* also have the
>>>> submodules in their proper places, so that they can get updated.
>>>> Furthermore, if you clone a repo that has relative-path submodules you
>>>> *must* also clone the submodules.
>>>
>>> That is not true. You can have relative submodules and just clone/fetch
>>> some from a different remote. Its just a question of how to
>>> specifiy/transport this information.
>>
>> I meant that more as a general guideline than some kind of physical law.
>> Sure, it's possible to scatter the submodules across all sorts of hosts, but
>> it's not a good idea.  When it comes to relative-path submodules, pushing and
>> fetching submodule changes in the super-repo should just involve the one
>> remote host (whatever way that's determined).  This keeps things tractable,
>> because otherwise your branch's changes are scattered among many different
>> hosts and you end up considering weird things like "this part of the branch's
>> changes are on host A but this other part are on host B, so let's record that
>> somewhere, oh but what if host B is down when I'm trying to fetch, but I know
>> that host C has the changes too so why don't I just fetch what I want from
>> there".
>>
>> It's a nightmare.  It's infinitely better to treat a repository and its
>> relative-path submodules as an atomic unit, so that any remote that hosts the
>> repository also hosts the submodules.  When pushing a branch with submodule
>> changes, expect to find those submodules on the target remote and update
>> them.  Regardless of how the target remote is determined.  Same thing for
>> fetching.  It's just so much simpler to work this way.
> 
> You are right, its simpler. But I would not say "better". Depending on
> your project it might be "better" to just fork some submodules.

I think we need a clear definition of "fork" here.  Are you concerned that
there's are copies of the submodule repositories that are "unused" in the
branch?  (Indeed, yes, you are, as I see below.)

>> So please, let's not try to specify submodule remotes per-branch or make that
>> info pushable.  It's enough for a branch's local configuration to say that it
>> tracks fetch/pull refs on different remotes.  The rest should flow from that.
> 
> Why not? Git is all about flexibility. Of course if you organise your
> submodules in chaos you will get chaos. But consider this:
> 
> You have this big project which consists of submodule (e.g. like Android
> with hundreds of submodules). Now you want to develop on something that
> involves just a subset of submodules, lets say two submodules.
> 
> Now if someone just wants to publish a small change to some submodules
> you are demanding to setup a mirror of *all* submodules that are in this
> big project. That might not even be feasible depending on the projects
> size and the remote quota. Not to speak about having to first create a
> fork of hundreds of repositories. So in this situation we should support
> just referring some submodules to other places.

I feel that this scenario is something of a straw-man.  At the very least,
the developer already has a clone of all the submodules.  Disk space is cheap.

(If the developer doesn't need all the submodules then I suggest that the
super-project is badly organized and should use intermediate submodules to
make it easier for developers to focus on isolated areas.  That being said, I
can appreciate that repository hygiene is more art than science, and that a
large and/or long-lived project could end up with some pretty funky
configurations.)

> Regarding transporting this information. If you ask someone to try out
> your change it should be as simple as possible. It should be enough to
> say. clone from there and checkout that branch (once recursive checkout
> and fetch for submodules is in place). So here we need a way to
> transport this configuration for a fork.

You're assuming that the super-project is organized in such a way that
submodule-reliant code changes can live in isolation from the rest of the
project.  That's a bit like saying you can try out a change in gitk without
having the rest of git.  The super-project exists as a complete thing, and I
don't believe there are many projects where it would make sense to only try
out a change in isolation.  I'm not familiar with the Android project, but
I'd be mighty impressed if changes to any arbitrary subset of its submodules
could be thoroughly tested without a full Android system.

So I don't believe the scenario you're suggesting is at all simple.  The
person trying out the change can't just "clone from there" because the
submodules uanffected by the branch aren't there.  At the very least this
person needs to start with "origin" clones of the super-project and all of
its required submodules, not just the ones changed in the branch.  Then the
person can add the "fork" host as a remote and fetch the branch.

But it's still not that simple.  Because now you're expecting that the branch
somehow has information that overrides some submodules' URLs stored in
.git/config.  Coding that might be easy, I don't know, but as you say the
override information needs to be stored somewhere transportable and
branchable, like maybe a .gitmodules-fork file or something.  Because
obviously different branches will have different submodule overrides.

And that makes it even more complicated!  If the remote-overriding
information is stored as part of the branch then in fact that branch can't
just be merged and pushed to the "origin" host, because the submodules there
must not have their remotes overridden.  So now the branch has to be changed
in order to remove the overrides.  Users have to remember to do that, or
they'll break the origin's submodules.  But when the branch changes suddenly
whatever people reviewed in the "fork" isn't what gets pushed back to the
"origin".

> Yes for a small project where its feasible to simply clone all
> submodules you can just say: please fork everything. But for bigger
> projects thats not necessarily an option. So we should at least give the
> users that option. Then its a matter of policy how you work with a
> project.

OK, but even if we want to eventually do both perhaps it would be wiser to
start with the simple fork-everything model.  Maybe just teach "clone
--mirror" to also create relative-path submodules in their proper locations,
so that forks become easier.

> I am not saying that everything for this should be implemented in the
> first steps but we should keep it in mind and design everything in such
> a way that it is still possible to implement such a kind of workflow
> later.

I agree with using an incremental approach, but it's important to understand
where we want to go before suggesting a first step.  I'm just trying to think
through the implications of what's been suggested.  Please set me straight if
I'm not thinking about this properly.

		M.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-08-29 15:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-18 18:22 Relative submodule URLs Robert Dailey
2014-08-18 20:55 ` Jonathan Nieder
2014-08-19 10:24   ` Heiko Voigt
2014-08-19 16:15     ` Robert Dailey
2014-08-19 16:39       ` Junio C Hamano
2014-08-19 16:50         ` Robert Dailey
2014-08-19 19:19           ` Junio C Hamano
2014-08-19 20:18             ` Robert Dailey
2014-08-19 19:30           ` Heiko Voigt
2014-08-19 20:23             ` Robert Dailey
2014-08-19 20:57               ` Heiko Voigt
2014-08-20 13:18                 ` Robert Dailey
2014-08-21 12:37                   ` Heiko Voigt
2014-08-19 16:07   ` Robert Dailey
2014-08-22 16:00     ` Marc Branchaud
2014-08-24 13:34       ` Heiko Voigt
2014-08-25 14:29         ` Robert Dailey
2014-08-25 14:32           ` Robert Dailey
2014-08-26  6:28           ` Heiko Voigt
2014-08-26 15:18             ` Robert Dailey
2014-08-26 20:34               ` Heiko Voigt
2014-08-25 13:48       ` Robert Dailey
2014-08-28 17:44       ` Marc Branchaud
2014-08-28 19:35         ` Heiko Voigt
2014-08-29 15:09           ` Marc Branchaud

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).