git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yaroslav Halchenko <yoh@onerussian.com>
To: Git Gurus hangout <git@vger.kernel.org>
Cc: Benjamin Poldrack <benjaminpoldrack@gmail.com>,
	Joey Hess <id@joeyh.name>, Jens Lehmann <Jens.Lehmann@web.de>
Subject: Re: problems serving non-bare repos with submodules over http
Date: Wed, 20 Apr 2016 15:45:33 -0400	[thread overview]
Message-ID: <20160420194533.GO23764@onerussian.com> (raw)
In-Reply-To: <CAGZ79kYS-F1yKpNP7jmhTiZT1R_pucUBBTCbmHKZz6Xd6dy8EA@mail.gmail.com>


On Wed, 20 Apr 2016, Stefan Beller wrote:
> > I do realize that the situation is quite uncommon, partially I guess due
> > to git submodules mechanism flexibility and power on one hand and
> > under-use (imho) on the other, which leads to discovery of regressions
> > [e.g. 1] and corner cases as mine.

> Thanks for fixing the under-use and reporting bugs. :)

I am thrilled to help ;)

> > [1] http://thread.gmane.org/gmane.comp.version-control.git/288064
> > [2] http://www.onerussian.com/tmp/git-web-submodules.sh

> > My use case:  We are trying to serve a git repository with submodules
> > specified with relative paths over http from a simple web server.  With a demo
> > case and submodule specification [complete script to reproduce including the
> > webserver using python is at 2] such as

> > (git)hopa:/tmp/gitxxmsxYFO[master]git
> > $> tree
> > .
> > ├── f1
> > └── sub1
> >     └── f2

> > $> cat .gitmodules
> > [submodule "sub1"]
> >     path = sub1
> >     url = ./sub1


> > 1. After cloning

> >     git clone http://localhost:8080/.git

> >    I cannot 'submodule update' the sub1 in the clone since its url after
> >    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
> >    it up -- it seems to proceed normally since in original repository I have
> >    sub1/.git/ directory and not the "gitlink" for that submodule.

> So the expected URL would be  http://localhost:8080/sub1/.git ?

ATM, yes

> I thought you could leave out the .git prefix, i.e. you can type

>      git clone http://localhost:8080

> and Git will recognize the missing .git and try that as well. The relative URL
> would then be constructed as http://localhost:8080/sub1, which will use the
> same mechanism to find the missing .git ending.

[note1] Unfortunately it is not the case ATM (git version
2.8.1.369.geae769a, output is interspersed with log from the python's simple
http server):

$> git clone http://localhost:8080 xxx                   
Cloning into 'xxx'...             
127.0.0.1 - - [20/Apr/2016 15:01:25] code 404, message File not found
127.0.0.1 - - [20/Apr/2016 15:01:25] "GET /info/refs?service=git-upload-pack HTTP/1.1" 404 -
fatal: repository 'http://localhost:8080/' not found


> > 2. If I serve the clone [2 demos that too] itself, there is no easy remedy at
> >    all since sub1/.git is not a directory but a gitlink.

> Not sure I understand the second question.

If I serve via http a repository where sub1/.git is a "gitlink":

    (git)hopa:/tmp/gitxxmsxYFO_[master]
    $> cat sub1/.git 
    gitdir: ../.git/modules/sub1

Such repository cannot be cloned:

    (git)hopa:/tmp/gitxxmsxYFO_[master]git
    $> git clone http://localhost:8080/sub1 /tmp/xxx
    Cloning into '/tmp/xxx'...                      
    127.0.0.1 - - [20/Apr/2016 15:04:01] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:01] "GET /sub1/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/' not found

    $> git clone http://localhost:8080/sub1/.git /tmp/xxx 
    Cloning into '/tmp/xxx'...
    127.0.0.1 - - [20/Apr/2016 15:04:06] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:06] "GET /sub1/.git/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/.git/' not found


> > N.B. I haven't approached nested submodules case yet in [2]

> > I wondered

> > a. could 'git clone' (probably actually some relevant helper used by fetch
> >    etc) acquire ability to sense for URL/.git if URL itself doesn't point to a
> >    usable git repository?

> So you mean in case of relative submodules, we need to take the parent
> url, and remove the ".git" at the end and try again if we cannot find
> the submodule?

that would be the a.2 which I have forgotten to outline ;)

in a.  I was suggesting what you have assumed [note 1 above] would be
happening (but doesn't) ATM: that /.git would be automagically sensed.

> >     I think this could provide complete remedy for 1 since then relative urls
> >     would be properly assembled, with similar 'sensing' for /.git for the final urls

> >     I guess we could do it with rewrites/forwards on the "server side",
> >     but it wouldn't be generally acceptable solution.

> > b. is there a better or already existing way to remedy my situation?

> > c. shouldn't "git clone" (or the relevant helper) be aware of remote
> >    /.git possibly being a gitlink file within submodule?

> Oh. I think that non-bare repositories including submodules are not designed
> to be cloned, because they are for use in the file system.

Well -- that is the beauty of git being a distributed VCS, that non-bare repos
seems to be as nicely cloneable as bare ones. And in general it seems to work
with submodules as well, since they should be the "consistent"
philosophically... 

>  Even a local clone fails:

>     # gerrit is a project I know which also has submodules:
>     git clone --recurse-submodules https://gerrit.googlesource.com/gerrit g1
>     git clone --recurse-submodules g1 g2
>     ...
> fatal: clone of '...' into submodule path '...' failed

I guess that is just yet another bug with relative paths in the
submodules.

> So I think for cloning repositories you want to have each repository
> as its own thing (bare or non bare).

in your first line in the example above you somewhat have shown the
counter-argument to the statement.  Indeed each repository should be its own
thing, just possibly registered as a submodule to another one.

> The submodule mechanism is just a way to express a relation between
> the reositories, it's like composing them together, but by that composition
> it breaks the properties of each repository to be easily clonable.

It doesn't really (unless in the cases we both pointed out).  E.g. I can as
easily clone original sub1 repository which was  registered as a submodule of
another one.  Either treatment of them by git during cloning (and placing under
root repo's .git/modules, etc) undermines that feature -- that is the
question we could also discuss here somewhat I guess ;)

> I think we should fix that.

would be awesome! Thanks in advance ;)

> I guess the local clone case is 'easy' as you only need
> to handle the link instead of directory thing correctly.

> For the case you describe (cloning from a remote, whether it is http or ssh),
> we would need to discuss security implications I would assume? It sounds
> scary at first to follow a random git link to the outer space of the repository.

more like "into the inner space".  git already (as  above example shown)
descends right away into  "/info/refs?", so how sensing "/.git/" would be any
different?

> (A similar thing is that you cannot have symlinks in a git repository pointing
> outside of it, IIRC? At least that was fishy.)

that might indeed be dangerous.  but once again, per above argument similarly
up to the "provider" I guess to guarantee protection, e.g. forbidding following
symlink on the webserver for that served directory, if content is not under his
control.

Cheers and thanks for your quick reply Stefan!
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

  reply	other threads:[~2016-04-20 19:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-20 15:22 problems serving non-bare repos with submodules over http Yaroslav Halchenko
2016-04-20 16:14 ` Stefan Beller
2016-04-20 19:45   ` Yaroslav Halchenko [this message]
2016-04-20 19:51   ` Junio C Hamano
2016-04-20 21:05     ` Stefan Beller
2016-04-20 21:27       ` Junio C Hamano
2016-04-20 23:05         ` Stefan Beller
2016-04-21  3:14           ` Yaroslav Halchenko
2016-04-21 17:11             ` Stefan Beller
2016-04-21 17:45               ` Junio C Hamano
2016-04-21 17:48                 ` Stefan Beller
2016-04-21 22:42                   ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160420194533.GO23764@onerussian.com \
    --to=yoh@onerussian.com \
    --cc=Jens.Lehmann@web.de \
    --cc=benjaminpoldrack@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).