All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yaroslav Halchenko <yoh@onerussian.com>
To: Git Gurus hangout <git@vger.kernel.org>
Cc: Benjamin Poldrack <benjaminpoldrack@gmail.com>,
	Joey Hess <id@joeyh.name>, Jens Lehmann <Jens.Lehmann@web.de>
Subject: Re: problems serving non-bare repos with submodules over http
Date: Wed, 20 Apr 2016 15:45:33 -0400	[thread overview]
Message-ID: <20160420194533.GO23764@onerussian.com> (raw)
In-Reply-To: <CAGZ79kYS-F1yKpNP7jmhTiZT1R_pucUBBTCbmHKZz6Xd6dy8EA@mail.gmail.com>


On Wed, 20 Apr 2016, Stefan Beller wrote:
> > I do realize that the situation is quite uncommon, partially I guess due
> > to git submodules mechanism flexibility and power on one hand and
> > under-use (imho) on the other, which leads to discovery of regressions
> > [e.g. 1] and corner cases as mine.

> Thanks for fixing the under-use and reporting bugs. :)

I am thrilled to help ;)

> > [1] http://thread.gmane.org/gmane.comp.version-control.git/288064
> > [2] http://www.onerussian.com/tmp/git-web-submodules.sh

> > My use case:  We are trying to serve a git repository with submodules
> > specified with relative paths over http from a simple web server.  With a demo
> > case and submodule specification [complete script to reproduce including the
> > webserver using python is at 2] such as

> > (git)hopa:/tmp/gitxxmsxYFO[master]git
> > $> tree
> > .
> > ├── f1
> > └── sub1
> >     └── f2

> > $> cat .gitmodules
> > [submodule "sub1"]
> >     path = sub1
> >     url = ./sub1


> > 1. After cloning

> >     git clone http://localhost:8080/.git

> >    I cannot 'submodule update' the sub1 in the clone since its url after
> >    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
> >    it up -- it seems to proceed normally since in original repository I have
> >    sub1/.git/ directory and not the "gitlink" for that submodule.

> So the expected URL would be  http://localhost:8080/sub1/.git ?

ATM, yes

> I thought you could leave out the .git prefix, i.e. you can type

>      git clone http://localhost:8080

> and Git will recognize the missing .git and try that as well. The relative URL
> would then be constructed as http://localhost:8080/sub1, which will use the
> same mechanism to find the missing .git ending.

[note1] Unfortunately it is not the case ATM (git version
2.8.1.369.geae769a, output is interspersed with log from the python's simple
http server):

$> git clone http://localhost:8080 xxx                   
Cloning into 'xxx'...             
127.0.0.1 - - [20/Apr/2016 15:01:25] code 404, message File not found
127.0.0.1 - - [20/Apr/2016 15:01:25] "GET /info/refs?service=git-upload-pack HTTP/1.1" 404 -
fatal: repository 'http://localhost:8080/' not found


> > 2. If I serve the clone [2 demos that too] itself, there is no easy remedy at
> >    all since sub1/.git is not a directory but a gitlink.

> Not sure I understand the second question.

If I serve via http a repository where sub1/.git is a "gitlink":

    (git)hopa:/tmp/gitxxmsxYFO_[master]
    $> cat sub1/.git 
    gitdir: ../.git/modules/sub1

Such repository cannot be cloned:

    (git)hopa:/tmp/gitxxmsxYFO_[master]git
    $> git clone http://localhost:8080/sub1 /tmp/xxx
    Cloning into '/tmp/xxx'...                      
    127.0.0.1 - - [20/Apr/2016 15:04:01] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:01] "GET /sub1/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/' not found

    $> git clone http://localhost:8080/sub1/.git /tmp/xxx 
    Cloning into '/tmp/xxx'...
    127.0.0.1 - - [20/Apr/2016 15:04:06] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:06] "GET /sub1/.git/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/.git/' not found


> > N.B. I haven't approached nested submodules case yet in [2]

> > I wondered

> > a. could 'git clone' (probably actually some relevant helper used by fetch
> >    etc) acquire ability to sense for URL/.git if URL itself doesn't point to a
> >    usable git repository?

> So you mean in case of relative submodules, we need to take the parent
> url, and remove the ".git" at the end and try again if we cannot find
> the submodule?

that would be the a.2 which I have forgotten to outline ;)

in a.  I was suggesting what you have assumed [note 1 above] would be
happening (but doesn't) ATM: that /.git would be automagically sensed.

> >     I think this could provide complete remedy for 1 since then relative urls
> >     would be properly assembled, with similar 'sensing' for /.git for the final urls

> >     I guess we could do it with rewrites/forwards on the "server side",
> >     but it wouldn't be generally acceptable solution.

> > b. is there a better or already existing way to remedy my situation?

> > c. shouldn't "git clone" (or the relevant helper) be aware of remote
> >    /.git possibly being a gitlink file within submodule?

> Oh. I think that non-bare repositories including submodules are not designed
> to be cloned, because they are for use in the file system.

Well -- that is the beauty of git being a distributed VCS, that non-bare repos
seems to be as nicely cloneable as bare ones. And in general it seems to work
with submodules as well, since they should be the "consistent"
philosophically... 

>  Even a local clone fails:

>     # gerrit is a project I know which also has submodules:
>     git clone --recurse-submodules https://gerrit.googlesource.com/gerrit g1
>     git clone --recurse-submodules g1 g2
>     ...
> fatal: clone of '...' into submodule path '...' failed

I guess that is just yet another bug with relative paths in the
submodules.

> So I think for cloning repositories you want to have each repository
> as its own thing (bare or non bare).

in your first line in the example above you somewhat have shown the
counter-argument to the statement.  Indeed each repository should be its own
thing, just possibly registered as a submodule to another one.

> The submodule mechanism is just a way to express a relation between
> the reositories, it's like composing them together, but by that composition
> it breaks the properties of each repository to be easily clonable.

It doesn't really (unless in the cases we both pointed out).  E.g. I can as
easily clone original sub1 repository which was  registered as a submodule of
another one.  Either treatment of them by git during cloning (and placing under
root repo's .git/modules, etc) undermines that feature -- that is the
question we could also discuss here somewhat I guess ;)

> I think we should fix that.

would be awesome! Thanks in advance ;)

> I guess the local clone case is 'easy' as you only need
> to handle the link instead of directory thing correctly.

> For the case you describe (cloning from a remote, whether it is http or ssh),
> we would need to discuss security implications I would assume? It sounds
> scary at first to follow a random git link to the outer space of the repository.

more like "into the inner space".  git already (as  above example shown)
descends right away into  "/info/refs?", so how sensing "/.git/" would be any
different?

> (A similar thing is that you cannot have symlinks in a git repository pointing
> outside of it, IIRC? At least that was fishy.)

that might indeed be dangerous.  but once again, per above argument similarly
up to the "provider" I guess to guarantee protection, e.g. forbidding following
symlink on the webserver for that served directory, if content is not under his
control.

Cheers and thanks for your quick reply Stefan!
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

  reply	other threads:[~2016-04-20 19:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-20 15:22 problems serving non-bare repos with submodules over http Yaroslav Halchenko
2016-04-20 16:14 ` Stefan Beller
2016-04-20 19:45   ` Yaroslav Halchenko [this message]
2016-04-20 19:51   ` Junio C Hamano
2016-04-20 21:05     ` Stefan Beller
2016-04-20 21:27       ` Junio C Hamano
2016-04-20 23:05         ` Stefan Beller
2016-04-21  3:14           ` Yaroslav Halchenko
2016-04-21 17:11             ` Stefan Beller
2016-04-21 17:45               ` Junio C Hamano
2016-04-21 17:48                 ` Stefan Beller
2016-04-21 22:42                   ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160420194533.GO23764@onerussian.com \
    --to=yoh@onerussian.com \
    --cc=Jens.Lehmann@web.de \
    --cc=benjaminpoldrack@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.