git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* problems serving non-bare repos with submodules over http
@ 2016-04-20 15:22 Yaroslav Halchenko
  2016-04-20 16:14 ` Stefan Beller
  0 siblings, 1 reply; 12+ messages in thread
From: Yaroslav Halchenko @ 2016-04-20 15:22 UTC (permalink / raw)
  To: Git Gurus hangout; +Cc: Benjamin Poldrack, Joey Hess

Dear Git Folks,

I do realize that the situation is quite uncommon, partially I guess due
to git submodules mechanism flexibility and power on one hand and
under-use (imho) on the other, which leads to discovery of regressions
[e.g. 1] and corner cases as mine.

[1] http://thread.gmane.org/gmane.comp.version-control.git/288064
[2] http://www.onerussian.com/tmp/git-web-submodules.sh

My use case:  We are trying to serve a git repository with submodules
specified with relative paths over http from a simple web server.  With a demo
case and submodule specification [complete script to reproduce including the
webserver using python is at 2] such as

(git)hopa:/tmp/gitxxmsxYFO[master]git
$> tree
.
├── f1
└── sub1
    └── f2

$> cat .gitmodules
[submodule "sub1"]
    path = sub1
    url = ./sub1


1. After cloning 

    git clone http://localhost:8080/.git

   I cannot 'submodule update' the sub1 in the clone since its url after
   'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
   it up -- it seems to proceed normally since in original repository I have
   sub1/.git/ directory and not the "gitlink" for that submodule.

2. If I serve the clone [2 demos that too] itself, there is no easy remedy at
   all since sub1/.git is not a directory but a gitlink.

N.B. I haven't approached nested submodules case yet in [2]

I wondered

a. could 'git clone' (probably actually some relevant helper used by fetch
   etc) acquire ability to sense for URL/.git if URL itself doesn't point to a
   usable git repository?

    I think this could provide complete remedy for 1 since then relative urls
    would be properly assembled, with similar 'sensing' for /.git for the final urls

    I guess we could do it with rewrites/forwards on the "server side",
    but it wouldn't be generally acceptable solution.

b. is there a better or already existing way to remedy my situation?

c. shouldn't "git clone" (or the relevant helper) be aware of remote
   /.git possibly being a gitlink file within submodule?


Thank you in advance for your thoughts and feedback on this case.

P.S. Please maintain the CC list in replies.
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 15:22 problems serving non-bare repos with submodules over http Yaroslav Halchenko
@ 2016-04-20 16:14 ` Stefan Beller
  2016-04-20 19:45   ` Yaroslav Halchenko
  2016-04-20 19:51   ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Beller @ 2016-04-20 16:14 UTC (permalink / raw)
  To: Yaroslav Halchenko
  Cc: Git Gurus hangout, Benjamin Poldrack, Joey Hess, Jens Lehmann

On Wed, Apr 20, 2016 at 8:22 AM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
> Dear Git Folks,
>
> I do realize that the situation is quite uncommon, partially I guess due
> to git submodules mechanism flexibility and power on one hand and
> under-use (imho) on the other, which leads to discovery of regressions
> [e.g. 1] and corner cases as mine.

Thanks for fixing the under-use and reporting bugs. :)

>
> [1] http://thread.gmane.org/gmane.comp.version-control.git/288064
> [2] http://www.onerussian.com/tmp/git-web-submodules.sh
>
> My use case:  We are trying to serve a git repository with submodules
> specified with relative paths over http from a simple web server.  With a demo
> case and submodule specification [complete script to reproduce including the
> webserver using python is at 2] such as
>
> (git)hopa:/tmp/gitxxmsxYFO[master]git
> $> tree
> .
> ├── f1
> └── sub1
>     └── f2
>
> $> cat .gitmodules
> [submodule "sub1"]
>     path = sub1
>     url = ./sub1
>
>
> 1. After cloning
>
>     git clone http://localhost:8080/.git
>
>    I cannot 'submodule update' the sub1 in the clone since its url after
>    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
>    it up -- it seems to proceed normally since in original repository I have
>    sub1/.git/ directory and not the "gitlink" for that submodule.

So the expected URL would be  http://localhost:8080/sub1/.git ?

I thought you could leave out the .git prefix, i.e. you can type

     git clone http://localhost:8080

and Git will recognize the missing .git and try that as well. The relative URL
would then be constructed as http://localhost:8080/sub1, which will use the
same mechanism to find the missing .git ending.

>
> 2. If I serve the clone [2 demos that too] itself, there is no easy remedy at
>    all since sub1/.git is not a directory but a gitlink.

Not sure I understand the second question.

>
> N.B. I haven't approached nested submodules case yet in [2]
>
> I wondered
>
> a. could 'git clone' (probably actually some relevant helper used by fetch
>    etc) acquire ability to sense for URL/.git if URL itself doesn't point to a
>    usable git repository?

So you mean in case of relative submodules, we need to take the parent
url, and remove the ".git" at the end and try again if we cannot find
the submodule?

>
>     I think this could provide complete remedy for 1 since then relative urls
>     would be properly assembled, with similar 'sensing' for /.git for the final urls
>
>     I guess we could do it with rewrites/forwards on the "server side",
>     but it wouldn't be generally acceptable solution.
>
> b. is there a better or already existing way to remedy my situation?
>
> c. shouldn't "git clone" (or the relevant helper) be aware of remote
>    /.git possibly being a gitlink file within submodule?

Oh. I think that non-bare repositories including submodules are not designed
to be cloned, because they are for use in the file system. Even a
local clone fails:

    # gerrit is a project I know which also has submodules:
    git clone --recurse-submodules https://gerrit.googlesource.com/gerrit g1
    git clone --recurse-submodules g1 g2
    ...
fatal: clone of '...' into submodule path '...' failed

So I think for cloning repositories you want to have each repository
as its own thing (bare or non bare).

The submodule mechanism is just a way to express a relation between
the reositories, it's like composing them together, but by that composition
it breaks the properties of each repository to be easily clonable.

I think we should fix that.

I guess the local clone case is 'easy' as you only need
to handle the link instead of directory thing correctly.

For the case you describe (cloning from a remote, whether it is http or ssh),
we would need to discuss security implications I would assume? It sounds
scary at first to follow a random git link to the outer space of the repository.
(A similar thing is that you cannot have symlinks in a git repository pointing
outside of it, IIRC? At least that was fishy.)

Thanks,
Stefan

>
>
> Thank you in advance for your thoughts and feedback on this case.
>
> P.S. Please maintain the CC list in replies.
> --
> Yaroslav O. Halchenko
> Center for Open Neuroscience     http://centerforopenneuroscience.org
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 16:14 ` Stefan Beller
@ 2016-04-20 19:45   ` Yaroslav Halchenko
  2016-04-20 19:51   ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Yaroslav Halchenko @ 2016-04-20 19:45 UTC (permalink / raw)
  To: Git Gurus hangout; +Cc: Benjamin Poldrack, Joey Hess, Jens Lehmann


On Wed, 20 Apr 2016, Stefan Beller wrote:
> > I do realize that the situation is quite uncommon, partially I guess due
> > to git submodules mechanism flexibility and power on one hand and
> > under-use (imho) on the other, which leads to discovery of regressions
> > [e.g. 1] and corner cases as mine.

> Thanks for fixing the under-use and reporting bugs. :)

I am thrilled to help ;)

> > [1] http://thread.gmane.org/gmane.comp.version-control.git/288064
> > [2] http://www.onerussian.com/tmp/git-web-submodules.sh

> > My use case:  We are trying to serve a git repository with submodules
> > specified with relative paths over http from a simple web server.  With a demo
> > case and submodule specification [complete script to reproduce including the
> > webserver using python is at 2] such as

> > (git)hopa:/tmp/gitxxmsxYFO[master]git
> > $> tree
> > .
> > ├── f1
> > └── sub1
> >     └── f2

> > $> cat .gitmodules
> > [submodule "sub1"]
> >     path = sub1
> >     url = ./sub1


> > 1. After cloning

> >     git clone http://localhost:8080/.git

> >    I cannot 'submodule update' the sub1 in the clone since its url after
> >    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
> >    it up -- it seems to proceed normally since in original repository I have
> >    sub1/.git/ directory and not the "gitlink" for that submodule.

> So the expected URL would be  http://localhost:8080/sub1/.git ?

ATM, yes

> I thought you could leave out the .git prefix, i.e. you can type

>      git clone http://localhost:8080

> and Git will recognize the missing .git and try that as well. The relative URL
> would then be constructed as http://localhost:8080/sub1, which will use the
> same mechanism to find the missing .git ending.

[note1] Unfortunately it is not the case ATM (git version
2.8.1.369.geae769a, output is interspersed with log from the python's simple
http server):

$> git clone http://localhost:8080 xxx                   
Cloning into 'xxx'...             
127.0.0.1 - - [20/Apr/2016 15:01:25] code 404, message File not found
127.0.0.1 - - [20/Apr/2016 15:01:25] "GET /info/refs?service=git-upload-pack HTTP/1.1" 404 -
fatal: repository 'http://localhost:8080/' not found


> > 2. If I serve the clone [2 demos that too] itself, there is no easy remedy at
> >    all since sub1/.git is not a directory but a gitlink.

> Not sure I understand the second question.

If I serve via http a repository where sub1/.git is a "gitlink":

    (git)hopa:/tmp/gitxxmsxYFO_[master]
    $> cat sub1/.git 
    gitdir: ../.git/modules/sub1

Such repository cannot be cloned:

    (git)hopa:/tmp/gitxxmsxYFO_[master]git
    $> git clone http://localhost:8080/sub1 /tmp/xxx
    Cloning into '/tmp/xxx'...                      
    127.0.0.1 - - [20/Apr/2016 15:04:01] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:01] "GET /sub1/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/' not found

    $> git clone http://localhost:8080/sub1/.git /tmp/xxx 
    Cloning into '/tmp/xxx'...
    127.0.0.1 - - [20/Apr/2016 15:04:06] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:06] "GET /sub1/.git/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/.git/' not found


> > N.B. I haven't approached nested submodules case yet in [2]

> > I wondered

> > a. could 'git clone' (probably actually some relevant helper used by fetch
> >    etc) acquire ability to sense for URL/.git if URL itself doesn't point to a
> >    usable git repository?

> So you mean in case of relative submodules, we need to take the parent
> url, and remove the ".git" at the end and try again if we cannot find
> the submodule?

that would be the a.2 which I have forgotten to outline ;)

in a.  I was suggesting what you have assumed [note 1 above] would be
happening (but doesn't) ATM: that /.git would be automagically sensed.

> >     I think this could provide complete remedy for 1 since then relative urls
> >     would be properly assembled, with similar 'sensing' for /.git for the final urls

> >     I guess we could do it with rewrites/forwards on the "server side",
> >     but it wouldn't be generally acceptable solution.

> > b. is there a better or already existing way to remedy my situation?

> > c. shouldn't "git clone" (or the relevant helper) be aware of remote
> >    /.git possibly being a gitlink file within submodule?

> Oh. I think that non-bare repositories including submodules are not designed
> to be cloned, because they are for use in the file system.

Well -- that is the beauty of git being a distributed VCS, that non-bare repos
seems to be as nicely cloneable as bare ones. And in general it seems to work
with submodules as well, since they should be the "consistent"
philosophically... 

>  Even a local clone fails:

>     # gerrit is a project I know which also has submodules:
>     git clone --recurse-submodules https://gerrit.googlesource.com/gerrit g1
>     git clone --recurse-submodules g1 g2
>     ...
> fatal: clone of '...' into submodule path '...' failed

I guess that is just yet another bug with relative paths in the
submodules.

> So I think for cloning repositories you want to have each repository
> as its own thing (bare or non bare).

in your first line in the example above you somewhat have shown the
counter-argument to the statement.  Indeed each repository should be its own
thing, just possibly registered as a submodule to another one.

> The submodule mechanism is just a way to express a relation between
> the reositories, it's like composing them together, but by that composition
> it breaks the properties of each repository to be easily clonable.

It doesn't really (unless in the cases we both pointed out).  E.g. I can as
easily clone original sub1 repository which was  registered as a submodule of
another one.  Either treatment of them by git during cloning (and placing under
root repo's .git/modules, etc) undermines that feature -- that is the
question we could also discuss here somewhat I guess ;)

> I think we should fix that.

would be awesome! Thanks in advance ;)

> I guess the local clone case is 'easy' as you only need
> to handle the link instead of directory thing correctly.

> For the case you describe (cloning from a remote, whether it is http or ssh),
> we would need to discuss security implications I would assume? It sounds
> scary at first to follow a random git link to the outer space of the repository.

more like "into the inner space".  git already (as  above example shown)
descends right away into  "/info/refs?", so how sensing "/.git/" would be any
different?

> (A similar thing is that you cannot have symlinks in a git repository pointing
> outside of it, IIRC? At least that was fishy.)

that might indeed be dangerous.  but once again, per above argument similarly
up to the "provider" I guess to guarantee protection, e.g. forbidding following
symlink on the webserver for that served directory, if content is not under his
control.

Cheers and thanks for your quick reply Stefan!
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 16:14 ` Stefan Beller
  2016-04-20 19:45   ` Yaroslav Halchenko
@ 2016-04-20 19:51   ` Junio C Hamano
  2016-04-20 21:05     ` Stefan Beller
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-04-20 19:51 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Yaroslav Halchenko, Git Gurus hangout, Benjamin Poldrack,
	Joey Hess, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

>> 1. After cloning
>>
>>     git clone http://localhost:8080/.git
>>
>>    I cannot 'submodule update' the sub1 in the clone since its url after
>>    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
>>    it up -- it seems to proceed normally since in original repository I have
>>    sub1/.git/ directory and not the "gitlink" for that submodule.
>
> So the expected URL would be  http://localhost:8080/sub1/.git ?
>
> I thought you could leave out the .git prefix, i.e. you can type
>
>      git clone http://localhost:8080
>
> and Git will recognize the missing .git and try that as well. The relative URL
> would then be constructed as http://localhost:8080/sub1, which will use the
> same mechanism to find the missing .git ending.

I may be missing the subtleties, but if you are serving others from
a non-bare repository with submodules, I do not think you would want
to expose the in-tree version of the submodule in the first place.

These $submodule/.git files point via "gitdir:" to their real
repository location, don't they?  And I would think that they are
what you would want to expose to the outside world.  Your in-tree
submodules may come and go as you checkout different branches in
your working tree, but these copies at their real locations will
stay.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 19:51   ` Junio C Hamano
@ 2016-04-20 21:05     ` Stefan Beller
  2016-04-20 21:27       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2016-04-20 21:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Yaroslav Halchenko, Git Gurus hangout, Benjamin Poldrack,
	Joey Hess, Jens Lehmann

On Wed, Apr 20, 2016 at 12:51 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> 1. After cloning
>>>
>>>     git clone http://localhost:8080/.git
>>>
>>>    I cannot 'submodule update' the sub1 in the clone since its url after
>>>    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
>>>    it up -- it seems to proceed normally since in original repository I have
>>>    sub1/.git/ directory and not the "gitlink" for that submodule.
>>
>> So the expected URL would be  http://localhost:8080/sub1/.git ?
>>
>> I thought you could leave out the .git prefix, i.e. you can type
>>
>>      git clone http://localhost:8080
>>
>> and Git will recognize the missing .git and try that as well. The relative URL
>> would then be constructed as http://localhost:8080/sub1, which will use the
>> same mechanism to find the missing .git ending.
>
> I may be missing the subtleties, but if you are serving others from
> a non-bare repository with submodules, I do not think you would want
> to expose the in-tree version of the submodule in the first place.

Well I would imagine that is the exact point.
If I was not trying to expose my state, I could ask you to
obtain your copy from $(git remote get-url origin) just as I did.

I would imagine, if I have a problem with some repo I can tell my
coworker or others to get my copy to took into that exact state.
(Or I want to transfer state from workstation to laptop to
continue working)

Without submodules this workflow works. So I'd expect it
to work with submodules as well eventually. Also we probably don't
want to mix cloning the superproject from this non bare repo and
the generic submodule locations as the superproject may have
advanced submodule pointers to commits which are not present
in the generic submodule remotes.

So for the non-bare case I would really expect to be able to "copy"
the remote including submodules from that remote?

We could reason about only providing this for the superproject though
and not for submodules, i.e. cloning from the non bare submodule
could be not supported. (If you really want that non bare submodule,
you can still clone it manually from

    $GIT_DIR_SUPER_PROJECT/modules/$MODULE_NAME



>
> These $submodule/.git files point via "gitdir:" to their real
> repository location, don't they?

Yes they do.

> And I would think that they are
> what you would want to expose to the outside world.  Your in-tree
> submodules may come and go as you checkout different branches in
> your working tree, but these copies at their real locations will
> stay.

Right instead of cloning $WORKTREE/sub/.git you rather want
$GITDIR/module/sub

(GITDIR and WORKTREE from the superprojects point of view)

The problem with a copy of a superproject including submodules is
the way cloning submodules work.

  1) clone the superproject
  2) for each gitlink in the tree, consult the .gitmodules file
  3) if we have a match in the .gitmodules file, clone from there

So currently the protocol doesn't allow to even specify the submodules
directories. In case the remote superproject is non bare in 1) the remote
would need to advertise the submodule repository URLS separately,
such that the cloning can be performed from those direct copies.


Thanks,
Stefan

>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 21:05     ` Stefan Beller
@ 2016-04-20 21:27       ` Junio C Hamano
  2016-04-20 23:05         ` Stefan Beller
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-04-20 21:27 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Yaroslav Halchenko, Git Gurus hangout, Benjamin Poldrack,
	Joey Hess, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

>> I may be missing the subtleties, but if you are serving others from
>> a non-bare repository with submodules, I do not think you would want
>> to expose the in-tree version of the submodule in the first place.
>
> Well I would imagine that is the exact point.
> If I was not trying to expose my state, I could ask you to
> obtain your copy from $(git remote get-url origin) just as I did.

That wasn't what I had in mind, but if the cloner cloned from your
repository with a working tree, the cloner would discover submodules
you use from your .gitmodules file, which would record the location
you cloned them from, so something like that may come into the
picture.  What I had in mind was more like this one you mentioned
below:

>     $GIT_DIR_SUPER_PROJECT/modules/$MODULE_NAME
> ...
> Right instead of cloning $WORKTREE/sub/.git you rather want
> $GITDIR/module/sub

> So currently the protocol doesn't allow to even specify the submodules
> directories.

Depends on what you exactly mean by "the protocol", but the
networking protocol is about accessing a single repository.  It is
up to you to decide where to go next after learning what you can
learn from the result, typically by following what appears in
the .gitmodules file.

The only special case is when .gitmodules file records the URL in a
relative form, I would think.  Traditionally (i.e. when it was
considered sane to clone only from bare repositories) I think people
expected a layout like this:

	top.git/
	top.git/refs/{heads,tags,...}/...
        top.git/objects/...
        top.git/sub.git/
	top.git/sub.git/refs/{heads,tags,...}/...
        top.git/sub.git/objects/...

and refer to ./sub.git from .gitmodules recorded in top.git.  It
still would be norm for common distribution sites (i.e. the original
place Yaroslav likely has cloned things from) to be bare, and with
or without $GIT_DIR/modules/, the relative path of submodule seen
by its superproject would (have to) be different between a bare and
a non-bare repository.

I'd imagine that people could agree on a common layout like this
even for a forest of bare repositories:

	top.git/
	top.git/refs/{heads,tags,...}/...
        top.git/objects/...
        top.git/modules/sub.git/
	top.git/modules/sub.git/refs/{heads,tags,...}/...
        top.git/modules/sub.git/objects/...

which would probably make the "relative" relationship between the
supermodule and its submodules the same between bare and non-bare
repositories, but I didn't think it too deeply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 21:27       ` Junio C Hamano
@ 2016-04-20 23:05         ` Stefan Beller
  2016-04-21  3:14           ` Yaroslav Halchenko
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2016-04-20 23:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Yaroslav Halchenko, Git Gurus hangout, Benjamin Poldrack,
	Joey Hess, Jens Lehmann

On Wed, Apr 20, 2016 at 2:27 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> I may be missing the subtleties, but if you are serving others from
>>> a non-bare repository with submodules, I do not think you would want
>>> to expose the in-tree version of the submodule in the first place.
>>
>> Well I would imagine that is the exact point.
>> If I was not trying to expose my state, I could ask you to
>> obtain your copy from $(git remote get-url origin) just as I did.
>
> That wasn't what I had in mind, but if the cloner cloned from your
> repository with a working tree, the cloner would discover submodules
> you use from your .gitmodules file, which would record the location
> you cloned them from, so something like that may come into the
> picture.  What I had in mind was more like this one you mentioned
> below:
>
>>     $GIT_DIR_SUPER_PROJECT/modules/$MODULE_NAME
>> ...
>> Right instead of cloning $WORKTREE/sub/.git you rather want
>> $GITDIR/module/sub
>
>> So currently the protocol doesn't allow to even specify the submodules
>> directories.
>
> Depends on what you exactly mean by "the protocol", but the
> networking protocol is about accessing a single repository.  It is
> up to you to decide where to go next after learning what you can
> learn from the result, typically by following what appears in
> the .gitmodules file.

Right. But the .gitmodules file is not sufficient.

If I clone from a bare hosting location, the .gitmodules file
is the best we can do and the .gitmodules file works as intended.
But in the non bare I case I think we would want to get the submodule
from that location as well.

So in git clone (which calls out to git submodule update, which uses
submodule--helper update_clone for cloning submodules) we'd want to see

    if remote was bare:
        do as usual (obtain URL from .gitmodules file)
    else
        take URL=$NON_BARE_REMOTE/module/submodule



>
> The only special case is when .gitmodules file records the URL in a
> relative form, I would think.  Traditionally (i.e. when it was
> considered sane to clone only from bare repositories) I think people
> expected a layout like this:
>
>         top.git/
>         top.git/refs/{heads,tags,...}/...
>         top.git/objects/...
>         top.git/sub.git/
>         top.git/sub.git/refs/{heads,tags,...}/...
>         top.git/sub.git/objects/...

which could also be referred to as

      top

without the .git suffix as someone thought this was an optimization?

Relative paths for submodules I have seen so far (on github,
googlesource, eclipse)
start with a ../ such that we'd have

>         top.git/
>         top.git/refs/{heads,tags,...}/...
>         top.git/objects/...
>         sub.git/
>         sub.git/refs/{heads,tags,...}/...
>         sub.git/objects/...

and the .git suffix omission works as we only need to check for the last
for characters and not somewhere in between. The sub.git is a standalone
repository, and you cannot tell it is a submodule (except by its contents)

>
> and refer to ./sub.git from .gitmodules recorded in top.git.  It
> still would be norm for common distribution sites (i.e. the original
> place Yaroslav likely has cloned things from) to be bare, and with
> or without $GIT_DIR/modules/, the relative path of submodule seen
> by its superproject would (have to) be different between a bare and
> a non-bare repository.

I think on a hosting site they could even coexist when having the
layout as above.

         top.git/
         top.git/refs/{heads,tags,...}/...
         top.git/objects/...
         sub.git/
         sub.git/refs/{heads,tags,...}/...
         sub.git/objects/...

         # the following only exist in non bare:
         top.git/modules/sub.git/
         top.git/modules/sub.git/refs/{heads,tags,...}/...
         top.git/modules/sub.git/objects/...

The later files would be more reflective of what you *really*
want if you clone from top.git.

Traditionally (when cloning was done from bare repos only),
the .gitmodules file provides a very good way to indicate what
the intent of the superproject is as the recorded sha1 in the tree
doesn't tell you anything and tracking the remote for the submodule
out of tree is cumbersome, so an in tree solution makes perfect sense.

If we have a non bare repo, it is safe to assume that the cloner actually
meant to get the whole state from the remote (including submodules)?

I am trying to think of reasons why you would not want to get that copy
from the remote. One (weak) reason is that the submodule may be a
well known library, which you can obtain faster from a well known git
hosting site rather than $remote.

>
> I'd imagine that people could agree on a common layout like this
> even for a forest of bare repositories:
>
>         top.git/
>         top.git/refs/{heads,tags,...}/...
>         top.git/objects/...
>         top.git/modules/sub.git/
>         top.git/modules/sub.git/refs/{heads,tags,...}/...
>         top.git/modules/sub.git/objects/...
>
> which would probably make the "relative" relationship between the
> supermodule and its submodules the same between bare and non-bare
> repositories, but I didn't think it too deeply.

Forrests as of now are handled as a flat level thing, e.g.

    git clone git://git.eclipse.org/gitroot/platform/eclipse.platform.releng.aggregator.git

will produce a superproject with 25 submodules, all of them
are either at ../ or at ../../ such that it would follow

         projects/top.git/
         projects/top.git/refs/{heads,tags,...}/...
         projects/top.git/objects/...
         projects/sub.git/
         projects/sub.git/refs/{heads,tags,...}/...
         projects/sub.git/objects/...
         libs/sub2.git
         libs/sub2.git/refs/{heads,tags,...}/...
         libs/sub2.git/objects/...

Looking at our internal code search there is no .gitmodules file
whose url starts with "./", they all start with ../ or are absolute.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-20 23:05         ` Stefan Beller
@ 2016-04-21  3:14           ` Yaroslav Halchenko
  2016-04-21 17:11             ` Stefan Beller
  0 siblings, 1 reply; 12+ messages in thread
From: Yaroslav Halchenko @ 2016-04-21  3:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Git Gurus hangout, Benjamin Poldrack, Joey Hess,
	Jens Lehmann

NB Thank you for the lively discussion!

On Wed, 20 Apr 2016, Stefan Beller wrote:

> >> So currently the protocol doesn't allow to even specify the submodules
> >> directories.

> > Depends on what you exactly mean by "the protocol", but the
> > networking protocol is about accessing a single repository.  It is
> > up to you to decide where to go next after learning what you can
> > learn from the result, typically by following what appears in
> > the .gitmodules file.

> Right. But the .gitmodules file is not sufficient.

why?

> >...<

> I think on a hosting site they could even coexist when having the
> layout as above.

>          top.git/
>          top.git/refs/{heads,tags,...}/...
>          top.git/objects/...
>          sub.git/
>          sub.git/refs/{heads,tags,...}/...
>          sub.git/objects/...

>          # the following only exist in non bare:
>          top.git/modules/sub.git/
>          top.git/modules/sub.git/refs/{heads,tags,...}/...
>          top.git/modules/sub.git/objects/...

> The later files would be more reflective of what you *really*
> want if you clone from top.git.

may be there is no need for assumptions and .gitmodules should be
sufficient?

- absolute url in .gitmodules provides absolute URL/path to the
  submodule of interest, regardless either submodule is present in
  originating repository as updated submodule.  Either cloning it
  instead of original repository would be more efficient is already a
  heuristic which might fail miserably (may be I have a faster
  connection to the original repository pointed by the absolute
  url than to this particular repository)

- relative url in .gitmodules provides relative location to the location
  of the "top" repository, and that is only when that submodule "absolute"
  url should be resolved relative to the one of the "top" repository 

NB I will consider it a separate issue either relative paths
without '../' prefix are having any sense in bare repositories.

or have I missed the point?
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-21  3:14           ` Yaroslav Halchenko
@ 2016-04-21 17:11             ` Stefan Beller
  2016-04-21 17:45               ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2016-04-21 17:11 UTC (permalink / raw)
  To: Yaroslav Halchenko
  Cc: Junio C Hamano, Git Gurus hangout, Benjamin Poldrack, Joey Hess,
	Jens Lehmann

On Wed, Apr 20, 2016 at 8:14 PM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
> NB Thank you for the lively discussion!
>
> On Wed, 20 Apr 2016, Stefan Beller wrote:
>
>> >> So currently the protocol doesn't allow to even specify the submodules
>> >> directories.
>
>> > Depends on what you exactly mean by "the protocol", but the
>> > networking protocol is about accessing a single repository.  It is
>> > up to you to decide where to go next after learning what you can
>> > learn from the result, typically by following what appears in
>> > the .gitmodules file.
>
>> Right. But the .gitmodules file is not sufficient.
>
> why?

What do you expect from cloning a repo with submodules?

In case of a bare repo:

    Get the repo from the specified remote and get the submodules
    from "somewhere" (and .gitmodules helps you guessing where
    "somewhere" is).

This has been the traditional way, and the .gitmodules file
is just a helper for a best guess where to get a submodule sha1
from. (The repo pointed at from the .gitmodules file may not exist
any more; or it may have forgot the wanted commit)

In case of non bare:

    Get the repo and all its submodules from the specified remote.
    (As the submodule is right there, that's the best guess to get it from,
    no need to get it from somewhere else. The submodule at the remote
    is the closest match you can get for replicating the superproject with
    its submodules.)

This way is heavy underutilized as it wasn't exercised as often I would
guess, so the "wrong" default (to obtain the submodule information from
.gitmodules instead of from the remote directly) was not pointed out before.

Now that the client wants to make a decision where to get the
submodules from, based on the bare-ness of the remote, it may
require changes in the wire protocol, such that the remote simply
advertises it is a (non-)bare repository when you clone the superproject
from it. Then the client can make a better decision where to get the
submodules from.




>
>> >...<
>
>> I think on a hosting site they could even coexist when having the
>> layout as above.
>
>>          top.git/
>>          top.git/refs/{heads,tags,...}/...
>>          top.git/objects/...
>>          sub.git/
>>          sub.git/refs/{heads,tags,...}/...
>>          sub.git/objects/...
>
>>          # the following only exist in non bare:
>>          top.git/modules/sub.git/
>>          top.git/modules/sub.git/refs/{heads,tags,...}/...
>>          top.git/modules/sub.git/objects/...
>
>> The later files would be more reflective of what you *really*
>> want if you clone from top.git.
>
> may be there is no need for assumptions and .gitmodules should be
> sufficient?
>
> - absolute url in .gitmodules provides absolute URL/path to the
>   submodule of interest, regardless either submodule is present in
>   originating repository as updated submodule.  Either cloning it
>   instead of original repository would be more efficient is already a
>   heuristic which might fail miserably (may be I have a faster
>   connection to the original repository pointed by the absolute
>   url than to this particular repository)
>
> - relative url in .gitmodules provides relative location to the location
>   of the "top" repository, and that is only when that submodule "absolute"
>   url should be resolved relative to the one of the "top" repository

I think the .gitmodules file is not sufficient for the following reason:

* As a "downstream" user you cannot change remote locations without
altering the history. Maybe you just want to have a mirror of some cool
open source project without the hassle to always merge and maintain changes
in your local submodules configuration. (c.f. git config url.<base>.insteadOf
for repos, just for submodule specific)
>
> NB I will consider it a separate issue either relative paths
> without '../' prefix are having any sense in bare repositories.

I guess it is a separate issue.

>
> or have I missed the point?
> --
> Yaroslav O. Halchenko
> Center for Open Neuroscience     http://centerforopenneuroscience.org
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-21 17:11             ` Stefan Beller
@ 2016-04-21 17:45               ` Junio C Hamano
  2016-04-21 17:48                 ` Stefan Beller
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-04-21 17:45 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Yaroslav Halchenko, Git Gurus hangout, Benjamin Poldrack,
	Joey Hess, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> In case of non bare:
>
>     Get the repo and all its submodules from the specified remote.
>     (As the submodule is right there, that's the best guess to get it from,
>     no need to get it from somewhere else. The submodule at the remote
>     is the closest match you can get for replicating the superproject with
>     its submodules.)
>
> This way is heavy underutilized as it wasn't exercised as often I would
> guess, 

My guess is somewhat different. It is not used because the right
semantics for such a use case hasn't been defined yet (in other
words, what you suggested is _wrong_ as is).  You need to remember
that a particular clone may not be interested in all submodules, and
it is far from "the closest match".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-21 17:45               ` Junio C Hamano
@ 2016-04-21 17:48                 ` Stefan Beller
  2016-04-21 22:42                   ` Jacob Keller
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2016-04-21 17:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Yaroslav Halchenko, Git Gurus hangout, Benjamin Poldrack,
	Joey Hess, Jens Lehmann

On Thu, Apr 21, 2016 at 10:45 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> In case of non bare:
>>
>>     Get the repo and all its submodules from the specified remote.
>>     (As the submodule is right there, that's the best guess to get it from,
>>     no need to get it from somewhere else. The submodule at the remote
>>     is the closest match you can get for replicating the superproject with
>>     its submodules.)
>>
>> This way is heavy underutilized as it wasn't exercised as often I would
>> guess,
>
> My guess is somewhat different. It is not used because the right
> semantics for such a use case hasn't been defined yet (in other
> words, what you suggested is _wrong_ as is).  You need to remember
> that a particular clone may not be interested in all submodules, and
> it is far from "the closest match".

Yes, when that clone doesn't have some submodules, we can still fall back
on the .gitmodules file.

If you have a submodule chances are, you are interested in it and modified it.
So the highest chance to get your changes is from your remote, no?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: problems serving non-bare repos with submodules over http
  2016-04-21 17:48                 ` Stefan Beller
@ 2016-04-21 22:42                   ` Jacob Keller
  0 siblings, 0 replies; 12+ messages in thread
From: Jacob Keller @ 2016-04-21 22:42 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Yaroslav Halchenko, Git Gurus hangout,
	Benjamin Poldrack, Joey Hess, Jens Lehmann

On Thu, Apr 21, 2016 at 10:48 AM, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Apr 21, 2016 at 10:45 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> In case of non bare:
>>>
>>>     Get the repo and all its submodules from the specified remote.
>>>     (As the submodule is right there, that's the best guess to get it from,
>>>     no need to get it from somewhere else. The submodule at the remote
>>>     is the closest match you can get for replicating the superproject with
>>>     its submodules.)
>>>
>>> This way is heavy underutilized as it wasn't exercised as often I would
>>> guess,
>>
>> My guess is somewhat different. It is not used because the right
>> semantics for such a use case hasn't been defined yet (in other
>> words, what you suggested is _wrong_ as is).  You need to remember
>> that a particular clone may not be interested in all submodules, and
>> it is far from "the closest match".
>
> Yes, when that clone doesn't have some submodules, we can still fall back
> on the .gitmodules file.
>
> If you have a submodule chances are, you are interested in it and modified it.
> So the highest chance to get your changes is from your remote, no?
> --

I agree with Stefan. I think that if I clone from my local non-bare
repository that may have work done inside the submodule it would be
best if the clone could grab the submodules directly from here and get
this work which might not yet be in the "real" remote yet.

The case could be made that you don't want to do this, I suppose..
Generally I think since we're already connected to this remote we know
we can access it, and getting submodules from here means we know it
will work, and give us the actual sha1 that the clone is using.

If we use .gitmodules, we'll possibly get a module that doesn't have
the commit, and the current gitmodules url might not even work
anymore.

That is, I don't really understand any downside to Stefan's
proposal,and I see a bunch of upside.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-04-21 22:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20 15:22 problems serving non-bare repos with submodules over http Yaroslav Halchenko
2016-04-20 16:14 ` Stefan Beller
2016-04-20 19:45   ` Yaroslav Halchenko
2016-04-20 19:51   ` Junio C Hamano
2016-04-20 21:05     ` Stefan Beller
2016-04-20 21:27       ` Junio C Hamano
2016-04-20 23:05         ` Stefan Beller
2016-04-21  3:14           ` Yaroslav Halchenko
2016-04-21 17:11             ` Stefan Beller
2016-04-21 17:45               ` Junio C Hamano
2016-04-21 17:48                 ` Stefan Beller
2016-04-21 22:42                   ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).