* Revamping the git protocol
@ 2005-10-20 4:31 H. Peter Anvin
2005-10-20 6:11 ` Junio C Hamano
2005-10-20 16:20 ` Linus Torvalds
0 siblings, 2 replies; 10+ messages in thread
From: H. Peter Anvin @ 2005-10-20 4:31 UTC (permalink / raw)
To: Git Mailing List
Okay, so I've started thinking about what it would take to revamp the
git protocol. What I came up with seems a little complex, but all it
really is is take the framework that most successful Internet protocols
have used and applied it to git.
Something else that I've noticed is that there is functionality overlap
between git-daemon and git-send-pack, such as the namespace management
(DWIM functionality.) Additionally, even when using git over ssh there
is the potential for version skew, so it might be worthwhile to run the
full protocol over ssh as well.
Anyway, here is a strawman. Items I feel unsure about I've put in brackets.
----------
1. "Strings" are sequences of bytes prefixed with a length. The length
is encoded as four lower-case hexadecimal digits. [Why not as 2 or 4
bytes of network byte order binary?] When represented in this text as
"foo", this means the sequence of bytes on the wire is <0003foo>.
2. Upon connection, the server will issue a sequence of strings,
terminated by a null string. The first string will be of the format:
"git <x.y>[ <hostname>]"
x.y is protocol revision (currently 1.0) with the following semantics:
- a change in x indicates a fully incompatible protocol change which
means a client which doesn't understand the exact x version should
immediately disconnect without issuing any output.
- a change in y indicates a backward-compatible protocol change which
menas a client which understands an older version of the protocol can
still communicate.
- hostname is an optional canonical name for this server.
For protocol version 1.0, subsequent strings are of the form:
"<R|O|I> option[ <parameters...>]"
... where the letter indicates REQUIRED, OPTIONAL or INFORMATIVE. If a
server specifies a REQUIRED option which the client does not understand
or support, the CLIENT should terminate with an "unable" command (see
below). An OPTIONAL option is available to the client should it choose
to accept it. An INFORMATIVE option has no protocol function, but may
be used to tune the client, inform the client of server policies (such
as timeouts) or display to the end user if the client is in verbose mode.
Note that the addition of options does not require a new protocol
revision. It is generally believed that the protocol revision will
rarely, if ever, be changed.
2a. Option "challenge":
"R challenge <seed>"
... where 'seed' is any sequence of bytes means that the client should
compute the SHA-1 of the seed and issue a "response" command with the
SHA1 in hexadecimal form before issuing any other command.
3. After receiving the list of options, the client can issue commands.
Commands are strings beginning with a command, one space, and any
arguments as appropriate to the command.
4. The response to a command is a string beginning with a dot-separated
sequence of numbers, one space, and an optional human-readable text
string. Each part of the dot-separated sequence refines the response;
if a client receives "3.1.1.6 foo" and doesn't know what it is, but
knows what a "3.1" response is, it should treat the 3.1.1.6 response as
a 3.1 response.
If the server is closing the connection, the response is prefixed with
the letter 'C':
"C5.0.1 Incorrect response"
Future versions of the protocol might define new prefix letters; if a
client encounters unknown prefix letters they should be ignored.
2 - successful completion, closing connection
3 - successful initiation, begin transaction
4 - transient error
4.1 - server resource exhaustion errors
4.1.1 - load too high
5 - permanent error
5.1 - protocol errors
5.2 - authentication error
5.2.1 - invalid reponse to challenge option
5.3 - permission errors
5.3.1 - repository access denied
5.4 - data integrity error
5.4.1 - invalid or corrupt repository
5. Commands, and their responses:
"response <sha1>"
... response to a "challenge" option. Responses:
"2.0 OK" - response accepted
"C5.2.1 Invalid response" - invalid response
"unable <human error message>"
... error message from the client to the server due to an unsupported R
option. Sending this message can inform the server administrator of
version skew problems.
Response:
"C5.1.1 Too bad"
"send-pack <path>"
... begin synchronization of the repository at <path>. Responses:
"3.1.1 Begin"
Any 4.1 response
Any 5.3 or 5.4 response
Clearly this needs to be fleshed out a bit more... is this total
insanity on my part, or is this something worth doing?
-hpa
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Revamping the git protocol
2005-10-20 4:31 Revamping the git protocol H. Peter Anvin
@ 2005-10-20 6:11 ` Junio C Hamano
2005-10-20 9:12 ` Petr Baudis
2005-10-20 16:20 ` Linus Torvalds
1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-10-20 6:11 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: git
Wow.
That's elaborate. And all this is to replace the beginning of
execute() part of daemon.c? What I am assuming is that after
exchanging command-response initially, you still plan to
eventually have the protocol driver such as upload-pack to take
things over, once "send-pack <path>" is issued, but is my
assumption correct? Or are you also thinking about redoing
upload-pack as well (otherwise you cannot issue 5.4 errors)?
I am wondering if we can just get away with a simpler scheme
Linus outlined instead. One drawback of that approach is it
does not easily allow things like challenge-response uniformly
across different commands (admittedly we only have "upload-pack"
command right now, but we could add list of supported commands
easily in execute()), but you could do something along this, I
presume?
When daemon is started with --require-challenge-response,
the client needs to issue "challenge-me" command and complete
challenge_response successfully before being able to issue any
other commands.
NOTE: this is just an outline, not a compilable patch. You need to
fill in the details of challenge response, definition of
"require_challenge_response" variable of type bool, and a
command line parsing to set that variable.
---
git diff
diff --git a/daemon.c b/daemon.c
index c3381b3..8a8746a 100644
--- a/daemon.c
+++ b/daemon.c
@@ -204,20 +204,55 @@ static int upload(char *dir)
return -1;
}
-static int execute(void)
+static int challenge_response(const char *me)
{
- static char line[1000];
- int len;
+ char line[1000];
- alarm(init_timeout ? init_timeout : timeout);
+ packet_write(1, "here comes your challenge");
+
+ alarm(timeout);
len = packet_read_line(0, line, sizeof(line));
alarm(0);
if (len && line[len-1] == '\n')
line[--len] = 0;
- if (!strncmp("git-upload-pack /", line, 17))
- return upload(line+16);
+ if ("validate response we obtained in line here")
+ return 1;
+ return 0;
+}
+
+static int execute(void)
+{
+ static char line[1000];
+ int len;
+ int client_ok = !require_challenge_response;
+ unsigned int time_out = init_timeout;
+
+ while (1) {
+
+ alarm(time_out);
+ time_out = timeout;
+ len = packet_read_line(0, line, sizeof(line));
+ alarm(0);
+ if (len && line[len-1] == '\n')
+ line[--len] = 0;
+
+ if (!strncmp("challenge-me ", line, 13)) {
+ client_ok = challenge_response(line+13);
+ continue;
+ }
+
+ if (!client_ok)
+ break;
+
+ if (!strncmp("git-upload-pack /", line, 17))
+ return upload(line+16);
+
+ /* more commands here later */
+
+ break;
+ }
logerror("Protocol error: '%s'", line);
return -1;
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: Revamping the git protocol
2005-10-20 6:11 ` Junio C Hamano
@ 2005-10-20 9:12 ` Petr Baudis
2005-10-20 15:50 ` H. Peter Anvin
2005-10-20 16:38 ` Linus Torvalds
0 siblings, 2 replies; 10+ messages in thread
From: Petr Baudis @ 2005-10-20 9:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: H. Peter Anvin, git
Dear diary, on Thu, Oct 20, 2005 at 08:11:17AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> I am wondering if we can just get away with a simpler scheme
> Linus outlined instead. One drawback of that approach is it
> does not easily allow things like challenge-response uniformly
> across different commands (admittedly we only have "upload-pack"
> command right now, but we could add list of supported commands
> easily in execute()), but you could do something along this, I
> presume?
What's wrong with my scheme? That is, _reply_ with challenge to the
upload-pack command. This should be equally powerful to the Linus'
scheme and the crucial advantage is that you do not need to tell at
the client side whether you are talking to a new server or an old one.
I was convinced that the authentication part of the challenge-resposne
isn't such a good idea after all, though. ;-)
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 9:12 ` Petr Baudis
@ 2005-10-20 15:50 ` H. Peter Anvin
2005-10-21 1:04 ` Petr Baudis
2005-10-20 16:38 ` Linus Torvalds
1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2005-10-20 15:50 UTC (permalink / raw)
To: Petr Baudis; +Cc: Junio C Hamano, git
Petr Baudis wrote:
> Dear diary, on Thu, Oct 20, 2005 at 08:11:17AM CEST, I got a letter
> where Junio C Hamano <junkio@cox.net> told me that...
>
>>I am wondering if we can just get away with a simpler scheme
>>Linus outlined instead. One drawback of that approach is it
>>does not easily allow things like challenge-response uniformly
>>across different commands (admittedly we only have "upload-pack"
>>command right now, but we could add list of supported commands
>>easily in execute()), but you could do something along this, I
>>presume?
>
> What's wrong with my scheme? That is, _reply_ with challenge to the
> upload-pack command. This should be equally powerful to the Linus'
> scheme and the crucial advantage is that you do not need to tell at
> the client side whether you are talking to a new server or an old one.
>
> I was convinced that the authentication part of the challenge-resposne
> isn't such a good idea after all, though. ;-)
>
Anyone noticed that either of those schemes aren't actually
backward-compatible in any way (old client talking to new server will be
disconnected), and that unfortunately is the best thing one can do with
the current setup, exactly because there is no option negotiation phase?
Another issue is that currently there is no error information propagated
back to the client; the server logs an error in its own logs, but the
client is simply disconnected.
-hpa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 15:50 ` H. Peter Anvin
@ 2005-10-21 1:04 ` Petr Baudis
0 siblings, 0 replies; 10+ messages in thread
From: Petr Baudis @ 2005-10-21 1:04 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Junio C Hamano, git
Dear diary, on Thu, Oct 20, 2005 at 05:50:20PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> Another issue is that currently there is no error information propagated
> back to the client; the server logs an error in its own logs, but the
> client is simply disconnected.
Yes. I agree that while it seems quite complex compared to what we have
now, your proposal has good points. But if we are going with the
challenge-response at all and if we are going with the simple form,
I was merely trying to make sure that it is as compatible as possible.
> Anyone noticed that either of those schemes aren't actually
> backward-compatible in any way (old client talking to new server will be
> disconnected), and that unfortunately is the best thing one can do with
> the current setup, exactly because there is no option negotiation phase?
Yes, option negotiation would solve this for us. But my scheme _is_
backwards-compatible in the way that new client taking to old server
will not be disconnected, so it's 50% better than the original proposal.
But I think that considering the long run, we should either not do this
challenge-response thing at all, and fix the problem by other (Linus')
means, or go for the "complex" scheme. I'd prefer the latter - sending
the error messages to the client alone is a huge improvement.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 9:12 ` Petr Baudis
2005-10-20 15:50 ` H. Peter Anvin
@ 2005-10-20 16:38 ` Linus Torvalds
2005-10-20 16:52 ` H. Peter Anvin
1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2005-10-20 16:38 UTC (permalink / raw)
To: Petr Baudis; +Cc: Junio C Hamano, H. Peter Anvin, git
On Thu, 20 Oct 2005, Petr Baudis wrote:
>
> What's wrong with my scheme? That is, _reply_ with challenge to the
> upload-pack command.
Neither your not Peter's scheme seems to be at all worried about backwards
compatibility, and I just don't see _why_.
Even if you can upgrade all servers (there aren't that many of them), why
force a client upgrade when the protocol is designed to be extensible?
Especially for somethign that doesn't even _buy_ you anything right now.
In fact, I'm not even sure it buys you anything in the future. The thing
is, SYN-flooding depends on overwhelming you with lots of simple packets.
And since in the git protocol, the expense is not in the _packets_ but in
the server-side packing and data transfer, I don't see the point.
If you want to DoS a git pack server, you open a hundred _real_ git
connections to it, carefully selected so that they get unique packs (so
that the server can't cache them). You don't need to have some distributed
denial-of-service attack with lots of magic packets.
This is why the git daemon already limits the clients to 25 by default or
something like that - it doesn't want to put too much strain on the
server.
A much more important thing the git daemon could do is to kill connections
from the same IP address when there's more than 25 pending ones. The
daemon actualy has the infrastructure for that - it's why it doesn't just
count its children, it actually saves child information away (it just
doesn't _use_ it for anything right now).
Similarly, git-upload-pack can be future-proofed by having it have some
data transfer timeout: if it doesn't make any progress at all in <n>
seconds, just kill itself. Things like _that_ are likely to be a lot more
important, I suspect.
And no, I don't think th egit protocol should do authentication. It's
hard. If you want to do authentication, you need to do encryption too, and
then you should do something else (but the git protocol _does_ work fine
over an encyrpted channel, so the "something else" might be to have some
secure web interface tunnel protocol or similar, and then just support
"git over https" or something ;).
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 16:38 ` Linus Torvalds
@ 2005-10-20 16:52 ` H. Peter Anvin
2005-10-20 17:17 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2005-10-20 16:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git
Linus Torvalds wrote:
>
> Similarly, git-upload-pack can be future-proofed by having it have some
> data transfer timeout: if it doesn't make any progress at all in <n>
> seconds, just kill itself. Things like _that_ are likely to be a lot more
> important, I suspect.
>
Right, I already submitted a patch for that.
> And no, I don't think th egit protocol should do authentication. It's
> hard. If you want to do authentication, you need to do encryption too, and
> then you should do something else (but the git protocol _does_ work fine
> over an encyrpted channel, so the "something else" might be to have some
> secure web interface tunnel protocol or similar, and then just support
> "git over https" or something ;).
git over ssh seems to be the obvious choice.
-hpa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 16:52 ` H. Peter Anvin
@ 2005-10-20 17:17 ` Linus Torvalds
2005-10-20 23:35 ` Johannes Schindelin
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2005-10-20 17:17 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Petr Baudis, Junio C Hamano, git
On Thu, 20 Oct 2005, H. Peter Anvin wrote:
>
> git over ssh seems to be the obvious choice.
Yes, but Petr is right that there might be room for some lighter-weight
"gits" secure protocol. One that doesn't necessarily require a whole user
ID thing.
For example, let's say that you're not the maintainer of your machine, but
you're in an environment where you are allowed to run daemons as yourself
(at a university, for example). And you have a group of people who want to
work together at a project, but they don't want to give write permissions
to the world or their bigger group (group "student").
And git itself _does_ actually support that, already. You can use the
standard "ssh:" thing (or just "hostname:pathname"), and the GIT_SSH
environment variable to set up any tunnelling program you want. Then you
can authenticate any way you want (and encrypt or not, whatever)..
So if somebody is in this situation, maybe we could have an example tunnel
client/server thing that does this.
This is unrelated to the git protocol itself, although the "pack over
ssh/tunnel" obviously uses all the same stuff for the actual transfer.
(It might also be worthwhile to have .git/config specify what program to
use, so that you don't need a global environment variable. It might even
be per-host, ie we could have git-send-pack and git-fetch-pack understand
config language like
[connect]
program=[server.uni.edu]:mytunnel
or something. It shouldn't even be hard to do. Certainly simpler than
doing a good authenticating tunnel).
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 17:17 ` Linus Torvalds
@ 2005-10-20 23:35 ` Johannes Schindelin
0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2005-10-20 23:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, Petr Baudis, Junio C Hamano, git
Hi,
On Thu, 20 Oct 2005, Linus Torvalds wrote:
> On Thu, 20 Oct 2005, H. Peter Anvin wrote:
> >
> > git over ssh seems to be the obvious choice.
>
> Yes, but Petr is right that there might be room for some lighter-weight
> "gits" secure protocol. One that doesn't necessarily require a whole user
> ID thing.
>
> For example, let's say that you're not the maintainer of your machine, but
> you're in an environment where you are allowed to run daemons as yourself
> (at a university, for example). And you have a group of people who want to
> work together at a project, but they don't want to give write permissions
> to the world or their bigger group (group "student").
If you are not maintainer, you could still start an SSH daemon which
listens on a port>1024 and gets its password data from a file different
from /etc/shadow (You could even use PAM...).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Revamping the git protocol
2005-10-20 4:31 Revamping the git protocol H. Peter Anvin
2005-10-20 6:11 ` Junio C Hamano
@ 2005-10-20 16:20 ` Linus Torvalds
1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2005-10-20 16:20 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Git Mailing List
On Wed, 19 Oct 2005, H. Peter Anvin wrote:
>
> 1. "Strings" are sequences of bytes prefixed with a length. The length is
> encoded as four lower-case hexadecimal digits. [Why not as 2 or 4 bytes of
> network byte order binary?] When represented in this text as "foo", this
> means the sequence of bytes on the wire is <0003foo>.
As a reason for your "why" - imagine debugging a protocol using telnet..
ASCII really is very nice for things like that.
And no, "foo" is not represented as <0003foo>. It's represented as
<0007foo>, because the length includes the length of the prefix.
The special sequence <0000> is a flush sequence, and it's designed so that
it's supposed to be distinguishable from an empty string <0004>. A <0001>
to <0003> will be rejected as an error. Maximum string length is thus
65531.
(Actually, right now flush it is _not_ distinguishable from an empty
string because we return 0 for both cases from packet_read_line(), but the
point being that the packet protocol _supports_ it being distinguishable
if we ever need it to).
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-10-21 1:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-20 4:31 Revamping the git protocol H. Peter Anvin
2005-10-20 6:11 ` Junio C Hamano
2005-10-20 9:12 ` Petr Baudis
2005-10-20 15:50 ` H. Peter Anvin
2005-10-21 1:04 ` Petr Baudis
2005-10-20 16:38 ` Linus Torvalds
2005-10-20 16:52 ` H. Peter Anvin
2005-10-20 17:17 ` Linus Torvalds
2005-10-20 23:35 ` Johannes Schindelin
2005-10-20 16:20 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).