git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] GIT paths
@ 2005-10-24  8:50 Junio C Hamano
  2005-10-25 12:31 ` Andreas Ericsson
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2005-10-24  8:50 UTC (permalink / raw)
  To: git

Our networking commands can take either URL or non URL to
specify remote repository.  This note first attempts to clarify
what <path> means in the current implementation, and then
discusses two possible enhancements.

A non URL always refers to the pack protocol going over an SSH
connection:

	<host> ':' <path>

	. path that starts with a slash '/' is absolute path on
	  the remote site.

	. path that does not start with a slash '/' is relative
	  to the home directory of the incoming user.

However, note that the administrator could futz with the login
shell of the user to give restricted access (chroot to change
the former).  The latter can be made different from the home
directory, if git-shell is changed to chdir() to somewhere else
first.  I am not suggesting this as a best practice -- just
mentioning the possibility for completeness.

A URL form is:

	<proto> ':' <host> ( ':' <port> ) '/' <rest-of-path> 

and <proto> is either 'git', or 'ssh' (also spelled as 'ssh+git'
or 'git+ssh').  In addition, you can use 'http' or 'rsync', but
these transports are not discussed further here.  They already
have established semantics for <path> = '/' + <rest-of-path>.

For connections over plain TCP talking with git-daemon, or over
SSH in this form, path is always relative to the root directory
on the remote site, because '/' that terminate either <host> or
<port> starts the <path> = '/' + <rest-of-path>.

There are two things I would like to discuss here.

 - It might make sense to have SERVER_ROOT (similar to
   DOCUMENT_ROOT in Apache) for git-daemon, so <path> does not
   have to be relative to the true filesystem root.  Note that
   this is not a security measure, but meant for administration
   convenience [*1*].

 - Over a git-daemon connection, supporting ~user expansion
   makes sense.  E.g git://host.xz/~junio/ refers to my home
   directory on that machine.  It would make it impossible to
   have a directory literally named '~junio' directly underneath
   the root directory, but that is a good limitation anyway.

The above enhancements, especially SERVER_ROOT, however make
paths inconsistent between non URL form and URL form.  This
probably is OK -- people are used to using different paths when
uploading to HTTP server and testing a download from it.  That
leaves one issue.  Do we want to support ~user expansion, and if
so how, on non git-daemon connections?

I would propose that

	git fetch host.xz:~junio/repo
	git fetch ssh://host.xz/~junio/repo

mean the same thing (i.e. both understand ~user expansion).
Also these are equivalent (i.e. no ~user expansion; both mean
absolute filesystem path without SERVER_ROOT prefixing):

	git fetch host.xz:/frotz/repo
	git fetch ssh://host.xz/frotz/repo

While these two might not mean the same thing (the former is
prefixed with SERVER_ROOT, but not the latter):

	git fetch git://host.xz/frotz/repo
	git fetch ssh://host.xz/frotz/repo

There are small technical issues.

 - connect.c should not be affected at all, since it does not
   know how the remote site arranges SERVER_ROOT (if we support
   it) or user home directories.

 - ssh://host.xz/path and host.xz:path connections spawn
   upload-pack or receive-pack directly, without being mediated
   by git-daemon.  This means that ~user expansion, if we want
   to support it, needs to be done by these programs themselves.

 - git-daemon needs to validate the incoming requested path and
   in order to avoid aliasing issues, we should resolve ~user
   expansion and SERVER_ROOT prefixing first, then validate the
   resulting path against white/black list, before calling
   upload-pack or receive-pack.  However, after git-daemon
   decides to run these programs, they could find out some
   problems with the specified repository and may need to report
   them.  Arguably, this reporting should not reveal the real
   path used to address the repository [*2*].

Although we _could_ forget about the "error reporting exposing
real path" issue for now, I think we should at least have a plan
to make things consistent and well defined.  Here is a strawman:

 - Have a common library code that takes user supplied path and
   does SERVER_ROOT prefixing and ~user expansion.

 - Have git-daemon use it to canonicalize the requested path
   before validating.  Make it invoke the programs with the path
   received from the other end (before SERVER_ROOT prefixing, or
   ~user expansion).

 - Give --server-root=/path/to/root flag to programs that can be
   called by git-daemon, and have git-daemon run them with this
   flag.  Have them use the same library to canonicalize the
   requested path to the real path.  When these programs are run
   via direct SSH connection (i.e. ssh://host/path and
   host:path), this flag is not given so they see filesystem
   path as-is, but make the ~user expansion still available.


[Footnote]

*1* You do not want to advertise your repo is at /mnt/disk1/repo
and find out that you need to move the disks around next day.
Of course you could plan ahead and have a symlink hanging below
the root directory (e.g. '/pub -> /mnt/disk1/git'), but it is so
much more convenient if you can just tell git-daemon that the
root level used to be /mnt/disk1/git but it is now somewhere
else.

*2* This is theoretical right now, since packed transfer
protocols cannot report errors back, but Andreas' patch
addresses this issue by dying carefully in srvside_chdir().  It
falls into security-by-obscurity category, so we may choose not
to worry about it, though.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GIT paths
  2005-10-24  8:50 [RFC] GIT paths Junio C Hamano
@ 2005-10-25 12:31 ` Andreas Ericsson
  2005-10-25 16:53   ` H. Peter Anvin
  2005-10-26  6:12   ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Andreas Ericsson @ 2005-10-25 12:31 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:
> Our networking commands can take either URL or non URL to
> specify remote repository.  This note first attempts to clarify
> what <path> means in the current implementation, and then
> discusses two possible enhancements.
> 
> For connections over plain TCP talking with git-daemon, or over
> SSH in this form, path is always relative to the root directory
> on the remote site, because '/' that terminate either <host> or
> <port> starts the <path> = '/' + <rest-of-path>.
> 
> There are two things I would like to discuss here.
> 
>  - It might make sense to have SERVER_ROOT (similar to
>    DOCUMENT_ROOT in Apache) for git-daemon, so <path> does not
>    have to be relative to the true filesystem root.  Note that
>    this is not a security measure, but meant for administration
>    convenience [*1*].
> 
>  - Over a git-daemon connection, supporting ~user expansion
>    makes sense.  E.g git://host.xz/~junio/ refers to my home
>    directory on that machine.  It would make it impossible to
>    have a directory literally named '~junio' directly underneath
>    the root directory, but that is a good limitation anyway.
> 

I like this idea, although I'd extend it with a Userdir-like config 
option in git-daemon (like ~/public_html for apache). This makes it a 
bit easier to see what's published and what isn't.

About the literally named /~junio directory, it would be possible with 
this syntax;

	git fetch host.xz:/~junio

The userdir is (with my previous patch) only expanded if the path starts 
with a tilde.

> The above enhancements, especially SERVER_ROOT, however make
> paths inconsistent between non URL form and URL form.  This
> probably is OK -- people are used to using different paths when
> uploading to HTTP server and testing a download from it.  That
> leaves one issue.  Do we want to support ~user expansion, and if
> so how, on non git-daemon connections?
> 
> I would propose that
> 
> 	git fetch host.xz:~junio/repo
> 	git fetch ssh://host.xz/~junio/repo
> 
> mean the same thing (i.e. both understand ~user expansion).
> Also these are equivalent (i.e. no ~user expansion; both mean
> absolute filesystem path without SERVER_ROOT prefixing):
> 
> 	git fetch host.xz:/frotz/repo
> 	git fetch ssh://host.xz/frotz/repo
> 
> While these two might not mean the same thing (the former is
> prefixed with SERVER_ROOT, but not the latter):
> 
> 	git fetch git://host.xz/frotz/repo
> 	git fetch ssh://host.xz/frotz/repo
> 
> There are small technical issues.
> 
>  - connect.c should not be affected at all, since it does not
>    know how the remote site arranges SERVER_ROOT (if we support
>    it) or user home directories.
> 

It must remove the leading slash for this syntax:

	ssh://host.xz/~junio/repo

Otherwise it would be passed as /~junio/repo to the remote end and no 
~user interpolation would be done.

> 
>  - git-daemon needs to validate the incoming requested path and
>    in order to avoid aliasing issues, we should resolve ~user
>    expansion and SERVER_ROOT prefixing first, then validate the
>    resulting path against white/black list, before calling
>    upload-pack or receive-pack.  However, after git-daemon
>    decides to run these programs, they could find out some
>    problems with the specified repository and may need to report
>    them.  Arguably, this reporting should not reveal the real
>    path used to address the repository [*2*].
> 

This could be done by writing the relative path in the error message;

	.git/foo/bar: failed to do something nifty

The user or the admin should know where that path is and will know what 
to do. Messages logged on the server-side should ofcourse hold the full 
path.

> Although we _could_ forget about the "error reporting exposing
> real path" issue for now, I think we should at least have a plan
> to make things consistent and well defined.  Here is a strawman:
> 
>  - Have a common library code that takes user supplied path and
>    does SERVER_ROOT prefixing and ~user expansion.
> 
>  - Have git-daemon use it to canonicalize the requested path
>    before validating.  Make it invoke the programs with the path
>    received from the other end (before SERVER_ROOT prefixing, or
>    ~user expansion).
> 

I'd say make it invoke the programs with the canonicalized path. As you 
say, git-daemon has to verify that it's a proper git repo and in the 
whitelist anyway so I think it would be silly to add extra complexity to 
upload-pack and receive-pack.

git-daemon could ofcourse present some uniform error message if 
git-upload-pack or git-receive-pack fails but this wouldn't really be 
necessary if they use relative paths as mentioned above (someone who 
makes one of those two fail while working will already know the path).

>  - Give --server-root=/path/to/root flag to programs that can be
>    called by git-daemon, and have git-daemon run them with this
>    flag.  Have them use the same library to canonicalize the
>    requested path to the real path.  When these programs are run
>    via direct SSH connection (i.e. ssh://host/path and
>    host:path), this flag is not given so they see filesystem
>    path as-is, but make the ~user expansion still available.
> 

If we stick with canonicalized paths I suppose this can be dropped.

> 
> [Footnote]
> 
> *2* This is theoretical right now, since packed transfer
> protocols cannot report errors back, but Andreas' patch
> addresses this issue by dying carefully in srvside_chdir().  It
> falls into security-by-obscurity category, so we may choose not
> to worry about it, though.
> 

Keeping valid usernames hidden is normally considered best practice.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GIT paths
  2005-10-25 12:31 ` Andreas Ericsson
@ 2005-10-25 16:53   ` H. Peter Anvin
  2005-10-26  6:12   ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: H. Peter Anvin @ 2005-10-25 16:53 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: git

Andreas Ericsson wrote:
>>
>>  - Over a git-daemon connection, supporting ~user expansion
>>    makes sense.  E.g git://host.xz/~junio/ refers to my home
>>    directory on that machine.  It would make it impossible to
>>    have a directory literally named '~junio' directly underneath
>>    the root directory, but that is a good limitation anyway.
>>
> 
> I like this idea, although I'd extend it with a Userdir-like config 
> option in git-daemon (like ~/public_html for apache). This makes it a 
> bit easier to see what's published and what isn't.
> 

I've found that whenever one does a network daemon which exports paths, 
sooner or later one wants namespace management.  In Linux, of course, 
there are a lot more tricks one can play to actually create the 
namespace one wants in the filesystem (although it's complicated by 
needing to have an exec-worthy environment.)

It might be worth to consider creating a library to do this.

	-hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GIT paths
  2005-10-25 12:31 ` Andreas Ericsson
  2005-10-25 16:53   ` H. Peter Anvin
@ 2005-10-26  6:12   ` Junio C Hamano
  2005-10-26  8:56     ` Andreas Ericsson
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2005-10-26  6:12 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: git

Andreas Ericsson <ae@op5.se> writes:

> About the literally named /~junio directory, it would be possible with 
> this syntax;
>
> 	git fetch host.xz:/~junio
>
> The userdir is (with my previous patch) only expanded if the path starts 
> with a tilde.

I do not necessarily consider that a feature; see next item.

> It must remove the leading slash for this syntax:
>
> 	ssh://host.xz/~junio/repo
>
> Otherwise it would be passed as /~junio/repo to the remote end and no 
> ~user interpolation would be done.

Not necessarily.  Having the remote end interpret "/~user" and
"~user" the same way might make things more consistent; in other
words, "http://host/~user" is not spelled "http://host~user".

> I'd say make it invoke the programs with the canonicalized path. As you 
> say, git-daemon has to verify that it's a proper git repo and in the 
> whitelist anyway so I think it would be silly to add extra complexity to 
> upload-pack and receive-pack.

Yeah, I tend to agree here.

>>  - Give --server-root=/path/to/root flag to programs...
>
> If we stick with canonicalized paths I suppose this can be dropped.

Sounds good.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GIT paths
  2005-10-26  6:12   ` Junio C Hamano
@ 2005-10-26  8:56     ` Andreas Ericsson
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Ericsson @ 2005-10-26  8:56 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:
> Andreas Ericsson <ae@op5.se> writes:
> 
>>
>>The userdir is (with my previous patch) only expanded if the path starts 
>>with a tilde.
> 
> 
> I do not necessarily consider that a feature; see next item.
> 
> 
>>It must remove the leading slash for this syntax:
>>
>>	ssh://host.xz/~junio/repo
>>
>>Otherwise it would be passed as /~junio/repo to the remote end and no 
>>~user interpolation would be done.
> 
> 
> Not necessarily.  Having the remote end interpret "/~user" and
> "~user" the same way might make things more consistent;


Except that the shell interprets ~ and /~ differently, so "consistent" 
would depend on what we're consistent with.

There's also the fact that shell-scripts won't work on the remote end if 
  git_connect() maintains the leading slash. This doesn't matter at 
present, but I think it'd be better to keep all doors open. Having the 
trivial addendum on the client side also helps keeping the server-side 
nice and simple.

> in other
> words, "http://host/~user" is not spelled "http://host~user".
> 

True. I meant for this to be invisible to the users ofcourse, with 
git_connect() having some snippet such as this.

if(use_ssh || use_git && (*path == '/' && *(path + 1) == '~')
	*path++ = '\0';
else
	copy_path();

> 
>>I'd say make it invoke the programs with the canonicalized path. As you 
>>say, git-daemon has to verify that it's a proper git repo and in the 
>>whitelist anyway so I think it would be silly to add extra complexity to 
>>upload-pack and receive-pack.
> 
> 
> Yeah, I tend to agree here.
> 
> 
>>> - Give --server-root=/path/to/root flag to programs...
>>
>>If we stick with canonicalized paths I suppose this can be dropped.
> 
> 
> Sounds good.
> 
> 

I'll get busy then.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-10-26  8:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-24  8:50 [RFC] GIT paths Junio C Hamano
2005-10-25 12:31 ` Andreas Ericsson
2005-10-25 16:53   ` H. Peter Anvin
2005-10-26  6:12   ` Junio C Hamano
2005-10-26  8:56     ` Andreas Ericsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).