git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Revamping the git protocol
@ 2005-10-20  4:31 H. Peter Anvin
  2005-10-20  6:11 ` Junio C Hamano
  2005-10-20 16:20 ` Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: H. Peter Anvin @ 2005-10-20  4:31 UTC (permalink / raw)
  To: Git Mailing List

Okay, so I've started thinking about what it would take to revamp the 
git protocol.  What I came up with seems a little complex, but all it 
really is is take the framework that most successful Internet protocols 
have used and applied it to git.

Something else that I've noticed is that there is functionality overlap 
between git-daemon and git-send-pack, such as the namespace management 
(DWIM functionality.)  Additionally, even when using git over ssh there 
is the potential for version skew, so it might be worthwhile to run the 
full protocol over ssh as well.

Anyway, here is a strawman.  Items I feel unsure about I've put in brackets.

  ----------

1. "Strings" are sequences of bytes prefixed with a length.  The length 
is encoded as four lower-case hexadecimal digits.  [Why not as 2 or 4 
bytes of network byte order binary?]  When represented in this text as 
"foo", this means the sequence of bytes on the wire is <0003foo>.

2. Upon connection, the server will issue a sequence of strings, 
terminated by a null string.  The first string will be of the format:

"git <x.y>[ <hostname>]"

x.y is protocol revision (currently 1.0) with the following semantics:

- a change in x indicates a fully incompatible protocol change which 
means a client which doesn't understand the exact x version should 
immediately disconnect without issuing any output.

- a change in y indicates a backward-compatible protocol change which 
menas a client which understands an older version of the protocol can 
still communicate.

- hostname is an optional canonical name for this server.

For protocol version 1.0, subsequent strings are of the form:

"<R|O|I> option[ <parameters...>]"

... where the letter indicates REQUIRED, OPTIONAL or INFORMATIVE.  If a 
server specifies a REQUIRED option which the client does not understand 
or support, the CLIENT should terminate with an "unable" command (see 
below).  An OPTIONAL option is available to the client should it choose 
to accept it.  An INFORMATIVE option has no protocol function, but may 
be used to tune the client, inform the client of server policies (such 
as timeouts) or display to the end user if the client is in verbose mode.

Note that the addition of options does not require a new protocol 
revision.  It is generally believed that the protocol revision will 
rarely, if ever, be changed.

2a. Option "challenge":

     "R challenge <seed>"

... where 'seed' is any sequence of bytes means that the client should 
compute the SHA-1 of the seed and issue a "response" command with the 
SHA1 in hexadecimal form before issuing any other command.

3. After receiving the list of options, the client can issue commands. 
Commands are strings beginning with a command, one space, and any 
arguments as appropriate to the command.

4. The response to a command is a string beginning with a dot-separated 
sequence of numbers, one space, and an optional human-readable text 
string.  Each part of the dot-separated sequence refines the response; 
if a client receives "3.1.1.6 foo" and doesn't know what it is, but 
knows what a "3.1" response is, it should treat the 3.1.1.6 response as 
a 3.1 response.

If the server is closing the connection, the response is prefixed with 
the letter 'C':

"C5.0.1 Incorrect response"

Future versions of the protocol might define new prefix letters; if a 
client encounters unknown prefix letters they should be ignored.

2	- successful completion, closing connection
3	- successful initiation, begin transaction
4	- transient error
4.1	- server resource exhaustion errors
4.1.1	- load too high
5	- permanent error
5.1	- protocol errors
5.2	- authentication error
5.2.1	- invalid reponse to challenge option
5.3	- permission errors
5.3.1	- repository access denied
5.4	- data integrity error
5.4.1	- invalid or corrupt repository

5. Commands, and their responses:

"response <sha1>"

... response to a "challenge" option.  Responses:

"2.0 OK" - response accepted
"C5.2.1 Invalid response" - invalid response

"unable <human error message>"

... error message from the client to the server due to an unsupported R 
option.  Sending this message can inform the server administrator of 
version skew problems.

Response:

"C5.1.1 Too bad"

"send-pack <path>"

... begin synchronization of the repository at <path>.  Responses:

"3.1.1 Begin"
Any 4.1 response
Any 5.3 or 5.4 response


Clearly this needs to be fleshed out a bit more... is this total 
insanity on my part, or is this something worth doing?

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20  4:31 Revamping the git protocol H. Peter Anvin
@ 2005-10-20  6:11 ` Junio C Hamano
  2005-10-20  9:12   ` Petr Baudis
  2005-10-20 16:20 ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-10-20  6:11 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: git

Wow.











That's elaborate.  And all this is to replace the beginning of
execute() part of daemon.c?  What I am assuming is that after
exchanging command-response initially, you still plan to
eventually have the protocol driver such as upload-pack to take
things over, once "send-pack <path>" is issued, but is my
assumption correct?  Or are you also thinking about redoing
upload-pack as well (otherwise you cannot issue 5.4 errors)?

I am wondering if we can just get away with a simpler scheme
Linus outlined instead.  One drawback of that approach is it
does not easily allow things like challenge-response uniformly
across different commands (admittedly we only have "upload-pack"
command right now, but we could add list of supported commands
easily in execute()), but you could do something along this, I
presume?

When daemon is started with --require-challenge-response,
the client needs to issue "challenge-me" command and complete
challenge_response successfully before being able to issue any
other commands.

NOTE: this is just an outline, not a compilable patch.  You need to
fill in the details of challenge response, definition of
"require_challenge_response" variable of type bool, and a
command line parsing to set that variable.


---

git diff
diff --git a/daemon.c b/daemon.c
index c3381b3..8a8746a 100644
--- a/daemon.c
+++ b/daemon.c
@@ -204,20 +204,55 @@ static int upload(char *dir)
 	return -1;
 }
 
-static int execute(void)
+static int challenge_response(const char *me)
 {
-	static char line[1000];
-	int len;
+	char line[1000];
 
-	alarm(init_timeout ? init_timeout : timeout);
+	packet_write(1, "here comes your challenge");
+
+	alarm(timeout);
 	len = packet_read_line(0, line, sizeof(line));
 	alarm(0);
 
 	if (len && line[len-1] == '\n')
 		line[--len] = 0;
 
-	if (!strncmp("git-upload-pack /", line, 17))
-		return upload(line+16);
+	if ("validate response we obtained in line here")
+		return 1;
+	return 0;
+}
+
+static int execute(void)
+{
+	static char line[1000];
+	int len;
+	int client_ok = !require_challenge_response;
+	unsigned int time_out = init_timeout;
+
+	while (1) {
+
+		alarm(time_out);
+		time_out = timeout;
+		len = packet_read_line(0, line, sizeof(line));
+		alarm(0);
+		if (len && line[len-1] == '\n')
+			line[--len] = 0;
+
+		if (!strncmp("challenge-me ", line, 13)) {
+			client_ok = challenge_response(line+13);
+			continue;
+		}
+
+		if (!client_ok)
+			break;
+
+		if (!strncmp("git-upload-pack /", line, 17))
+			return upload(line+16);
+
+		/* more commands here later */
+
+		break;
+	}
 
 	logerror("Protocol error: '%s'", line);
 	return -1;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20  6:11 ` Junio C Hamano
@ 2005-10-20  9:12   ` Petr Baudis
  2005-10-20 15:50     ` H. Peter Anvin
  2005-10-20 16:38     ` Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: Petr Baudis @ 2005-10-20  9:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: H. Peter Anvin, git

Dear diary, on Thu, Oct 20, 2005 at 08:11:17AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> I am wondering if we can just get away with a simpler scheme
> Linus outlined instead.  One drawback of that approach is it
> does not easily allow things like challenge-response uniformly
> across different commands (admittedly we only have "upload-pack"
> command right now, but we could add list of supported commands
> easily in execute()), but you could do something along this, I
> presume?

What's wrong with my scheme? That is, _reply_ with challenge to the
upload-pack command. This should be equally powerful to the Linus'
scheme and the crucial advantage is that you do not need to tell at
the client side whether you are talking to a new server or an old one.

I was convinced that the authentication part of the challenge-resposne
isn't such a good idea after all, though. ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20  9:12   ` Petr Baudis
@ 2005-10-20 15:50     ` H. Peter Anvin
  2005-10-21  1:04       ` Petr Baudis
  2005-10-20 16:38     ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2005-10-20 15:50 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git

Petr Baudis wrote:
> Dear diary, on Thu, Oct 20, 2005 at 08:11:17AM CEST, I got a letter
> where Junio C Hamano <junkio@cox.net> told me that...
> 
>>I am wondering if we can just get away with a simpler scheme
>>Linus outlined instead.  One drawback of that approach is it
>>does not easily allow things like challenge-response uniformly
>>across different commands (admittedly we only have "upload-pack"
>>command right now, but we could add list of supported commands
>>easily in execute()), but you could do something along this, I
>>presume?
> 
> What's wrong with my scheme? That is, _reply_ with challenge to the
> upload-pack command. This should be equally powerful to the Linus'
> scheme and the crucial advantage is that you do not need to tell at
> the client side whether you are talking to a new server or an old one.
> 
> I was convinced that the authentication part of the challenge-resposne
> isn't such a good idea after all, though. ;-)
> 

Anyone noticed that either of those schemes aren't actually 
backward-compatible in any way (old client talking to new server will be 
disconnected), and that unfortunately is the best thing one can do with 
the current setup, exactly because there is no option negotiation phase?

Another issue is that currently there is no error information propagated 
back to the client; the server logs an error in its own logs, but the 
client is simply disconnected.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20  4:31 Revamping the git protocol H. Peter Anvin
  2005-10-20  6:11 ` Junio C Hamano
@ 2005-10-20 16:20 ` Linus Torvalds
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2005-10-20 16:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List



On Wed, 19 Oct 2005, H. Peter Anvin wrote:
> 
> 1. "Strings" are sequences of bytes prefixed with a length.  The length is
> encoded as four lower-case hexadecimal digits.  [Why not as 2 or 4 bytes of
> network byte order binary?]  When represented in this text as "foo", this
> means the sequence of bytes on the wire is <0003foo>.

As a reason for your "why" - imagine debugging a protocol using telnet..

ASCII really is very nice for things like that.

And no, "foo" is not represented as <0003foo>. It's represented as 
<0007foo>, because the length includes the length of the prefix.

The special sequence <0000> is a flush sequence, and it's designed so that 
it's supposed to be distinguishable from an empty string <0004>. A <0001> 
to <0003> will be rejected as an error. Maximum string length is thus 
65531.

(Actually, right now flush it is _not_ distinguishable from an empty 
string because we return 0 for both cases from packet_read_line(), but the 
point being that the packet protocol _supports_ it being distinguishable 
if we ever need it to).

			Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20  9:12   ` Petr Baudis
  2005-10-20 15:50     ` H. Peter Anvin
@ 2005-10-20 16:38     ` Linus Torvalds
  2005-10-20 16:52       ` H. Peter Anvin
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2005-10-20 16:38 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, H. Peter Anvin, git



On Thu, 20 Oct 2005, Petr Baudis wrote:
> 
> What's wrong with my scheme? That is, _reply_ with challenge to the
> upload-pack command.

Neither your not Peter's scheme seems to be at all worried about backwards 
compatibility, and I just don't see _why_.

Even if you can upgrade all servers (there aren't that many of them), why 
force a client upgrade when the protocol is designed to be extensible?

Especially for somethign that doesn't even _buy_ you anything right now.

In fact, I'm not even sure it buys you anything in the future. The thing 
is, SYN-flooding depends on overwhelming you with lots of simple packets. 
And since in the git protocol, the expense is not in the _packets_ but in 
the server-side packing and data transfer, I don't see the point.

If you want to DoS a git pack server, you open a hundred _real_ git 
connections to it, carefully selected so that they get unique packs (so 
that the server can't cache them). You don't need to have some distributed 
denial-of-service attack with lots of magic packets.

This is why the git daemon already limits the clients to 25 by default or 
something like that - it doesn't want to put too much strain on the 
server.

A much more important thing the git daemon could do is to kill connections 
from the same IP address when there's more than 25 pending ones. The 
daemon actualy has the infrastructure for that - it's why it doesn't just 
count its children, it actually saves child information away (it just 
doesn't _use_ it for anything right now).

Similarly, git-upload-pack can be future-proofed by having it have some 
data transfer timeout: if it doesn't make any progress at all in <n> 
seconds, just kill itself. Things like _that_ are likely to be a lot more 
important, I suspect.

And no, I don't think th egit protocol should do authentication. It's 
hard. If you want to do authentication, you need to do encryption too, and 
then you should do something else (but the git protocol _does_ work fine 
over an encyrpted channel, so the "something else" might be to have some 
secure web interface tunnel protocol or similar, and then just support 
"git over https" or something ;).

			Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20 16:38     ` Linus Torvalds
@ 2005-10-20 16:52       ` H. Peter Anvin
  2005-10-20 17:17         ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2005-10-20 16:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git

Linus Torvalds wrote:
> 
> Similarly, git-upload-pack can be future-proofed by having it have some 
> data transfer timeout: if it doesn't make any progress at all in <n> 
> seconds, just kill itself. Things like _that_ are likely to be a lot more 
> important, I suspect.
> 

Right, I already submitted a patch for that.

> And no, I don't think th egit protocol should do authentication. It's 
> hard. If you want to do authentication, you need to do encryption too, and 
> then you should do something else (but the git protocol _does_ work fine 
> over an encyrpted channel, so the "something else" might be to have some 
> secure web interface tunnel protocol or similar, and then just support 
> "git over https" or something ;).

git over ssh seems to be the obvious choice.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20 16:52       ` H. Peter Anvin
@ 2005-10-20 17:17         ` Linus Torvalds
  2005-10-20 23:35           ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2005-10-20 17:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Petr Baudis, Junio C Hamano, git



On Thu, 20 Oct 2005, H. Peter Anvin wrote:
> 
> git over ssh seems to be the obvious choice.

Yes, but Petr is right that there might be room for some lighter-weight 
"gits" secure protocol. One that doesn't necessarily require a whole user 
ID thing.

For example, let's say that you're not the maintainer of your machine, but 
you're in an environment where you are allowed to run daemons as yourself 
(at a university, for example). And you have a group of people who want to 
work together at a project, but they don't want to give write permissions 
to the world or their bigger group (group "student").

And git itself _does_ actually support that, already. You can use the 
standard "ssh:" thing (or just "hostname:pathname"), and the GIT_SSH 
environment variable to set up any tunnelling program you want. Then you 
can authenticate any way you want (and encrypt or not, whatever)..

So if somebody is in this situation, maybe we could have an example tunnel 
client/server thing that does this.

This is unrelated to the git protocol itself, although the "pack over 
ssh/tunnel" obviously uses all the same stuff for the actual transfer.

(It might also be worthwhile to have .git/config specify what program to 
use, so that you don't need a global environment variable. It might even 
be per-host, ie we could have git-send-pack and git-fetch-pack understand 
config language like

	[connect]
		program=[server.uni.edu]:mytunnel

or something. It shouldn't even be hard to do. Certainly simpler than 
doing a good authenticating tunnel).

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20 17:17         ` Linus Torvalds
@ 2005-10-20 23:35           ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2005-10-20 23:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Petr Baudis, Junio C Hamano, git

Hi,

On Thu, 20 Oct 2005, Linus Torvalds wrote:

> On Thu, 20 Oct 2005, H. Peter Anvin wrote:
> > 
> > git over ssh seems to be the obvious choice.
> 
> Yes, but Petr is right that there might be room for some lighter-weight 
> "gits" secure protocol. One that doesn't necessarily require a whole user 
> ID thing.
> 
> For example, let's say that you're not the maintainer of your machine, but 
> you're in an environment where you are allowed to run daemons as yourself 
> (at a university, for example). And you have a group of people who want to 
> work together at a project, but they don't want to give write permissions 
> to the world or their bigger group (group "student").

If you are not maintainer, you could still start an SSH daemon which 
listens on a port>1024 and gets its password data from a file different 
from /etc/shadow (You could even use PAM...).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Revamping the git protocol
  2005-10-20 15:50     ` H. Peter Anvin
@ 2005-10-21  1:04       ` Petr Baudis
  0 siblings, 0 replies; 10+ messages in thread
From: Petr Baudis @ 2005-10-21  1:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Junio C Hamano, git

Dear diary, on Thu, Oct 20, 2005 at 05:50:20PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> Another issue is that currently there is no error information propagated 
> back to the client; the server logs an error in its own logs, but the 
> client is simply disconnected.

Yes. I agree that while it seems quite complex compared to what we have
now, your proposal has good points. But if we are going with the
challenge-response at all and if we are going with the simple form,
I was merely trying to make sure that it is as compatible as possible.

> Anyone noticed that either of those schemes aren't actually 
> backward-compatible in any way (old client talking to new server will be 
> disconnected), and that unfortunately is the best thing one can do with 
> the current setup, exactly because there is no option negotiation phase?

Yes, option negotiation would solve this for us. But my scheme _is_
backwards-compatible in the way that new client taking to old server
will not be disconnected, so it's 50% better than the original proposal.

But I think that considering the long run, we should either not do this
challenge-response thing at all, and fix the problem by other (Linus')
means, or go for the "complex" scheme. I'd prefer the latter - sending
the error messages to the client alone is a huge improvement.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-10-21  1:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-20  4:31 Revamping the git protocol H. Peter Anvin
2005-10-20  6:11 ` Junio C Hamano
2005-10-20  9:12   ` Petr Baudis
2005-10-20 15:50     ` H. Peter Anvin
2005-10-21  1:04       ` Petr Baudis
2005-10-20 16:38     ` Linus Torvalds
2005-10-20 16:52       ` H. Peter Anvin
2005-10-20 17:17         ` Linus Torvalds
2005-10-20 23:35           ` Johannes Schindelin
2005-10-20 16:20 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).