From: Jakub Narebski <jnareb@gmail.com>
To: Scott Chacon <schacon@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
git@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Request for detailed documentation of git pack protocol
Date: Thu, 4 Jun 2009 22:55:43 +0200 [thread overview]
Message-ID: <200906042255.43952.jnareb@gmail.com> (raw)
In-Reply-To: <200906022339.08639.jnareb@gmail.com>
This is combined response to various messages in this thread, following
my discoveries done using simple Perl script (using IO::Socket) which
assumes role of a git client, tested against github.com (IIRC it uses
Ruby implementation) and git.kernel.org (C Git), and "nc -l 9418".
By the way, is there some publicly accessible JGit (Java) and Dulwich
(Python) git-daemon one can test against?
sp = Shawn O. Pearce
jn = Jakub Narebski
gb = Git Community Book (http://book.git-scm.com)
jn>> I meant that in the request line for fetching via git:// protocol
jn>>
jn>> 0032git-upload-pack /project.git\\000host=myserver.com\\000
jn>>
jn>> you separate path to repository from extra options using "\0" / NUL
jn>> as a separator. Well, this is only sane separator, as it is path
jn>> terminator, the only character which cannot appear in pathname
jn>> (although I do wonder whether project names with e.g. control
jn>> characters or UTF-8 characters would work correctly).
sp>
sp> No, that isn't the reason '\0' is used here. But yea, that is true.
sp>
sp> The reason \0 is used is, git-daemon reads the 4 byte length, decodes
sp> that, then reads that many bytes. Finally it writes a '\0' at the
sp> end of what it read, so that the entire "line" is NUL terminated.
sp> Then it reads the "command path" part from the resulting C string.
sp>
sp> The host=myserver.com part came later, after many daemons were
sp> already running all over the world. By hiding it behind the '\0'
sp> an old daemon would never see it (but strlen() returned a value that
sp> was less than the length read, but the old daemons didn't care).
sp> Newer daemons look for where strlen() < length, and assume that
sp> the host header follows.
sp>
sp> The host header ends with '\0' in case additional headers would
sp> also appear here in the future. IOW, like HTTP allows new headers
sp> to be added before the "\r\n\r\n" terminator at the body, we allow
sp> them between "\0".
[...]
sp> The NUL at the end of the host name is not strictly required, but
sp> must be present if the client were to ever pass additional options
sp> to the server.
Actually both git.kernel.org and github.com failed (deadlocked / hung)
when I tried to add extra key=value parameter at the end of request:
003bgit-upload-pack /project.git\0host=myserver.com\0user=me\0
Hmmmm...
jn>> Hmmm... the communication between server and client is not entirely
jn>> clean. Do I understand correctly that this NAK is response to
jn>> clients flush after all those "want" lines?
sp>
sp> Yes.
sp>
jn>> And that "0009done" from client
jn>> tells server that it should send everything it has?
sp>
sp> Yes. It means the client will not issue any more "have" lines,
sp> as it has nothing further in its history, so the server just has
sp> to give up and start generating a pack based on what it knows.
Here we were talking about the following part of exchange:
(I have added "C:" prefix to signal that this is what client,
git-clone here, sends; I have added also explicit "\n" to mark LF
characters terminating lines, and put each pkt-line on separate line)
gb> C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack side-band-64k ofs-delta\n
gb> C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
gb> C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
gb> C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
gb> C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
gb> C: 0000
gb> C: 0009done\n
and where server response is (again the quote from "Git Community Book"
was modified, removing here doublequotes and doubling of backslashes):
gb> S: 0008NAK\n
gb> S: 0023\002Counting objects: 2797, done.\n
gb> [...]
gb> S: 2004\001PACK\000\000\000\002 [...]
I have thought that after sending "0000" flush line client can wait for
NAK or ACK server response... but it is not the case. When I tried to
read from server after "0000" flush and before "0009done\n", my client
(or netcat instance) deadlocked (hung) waiting for server response.
I either did a mistake in my fake client, or I don't understand git pack
protocol correctly. Should client wait for NAK or ACK from server _only_
after sending maximum number of want/have lines (256 if I remember
correctly?)?
When I removed sending "0000" flush line my fake client again hung
(deadlocked?) waiting for server.
jn>> P.S. By the way, is pkt-line format original invention, or was it
jn>> 'borrowed' from some other standard or protocol?
sp>
sp> No clue. I find it f'king odd that the length is in hex. There
sp> isn't much value to the protocol being human readable. The PACK
sp> part of the stream sure as hell ain't. You aren't going to type
sp> out a sequence of "have" lines against the remote, like you could
sp> with say an HTTP GET. *shrug*
"git gui blame pkt-line.c" shows that pkt-line format is Linus invention.
It looks quite a bit like 'chunked' transfer encoding[1] in HTTP; there
each non-empty chunk starts with the number of octets of the data it
embeds (size written in hexadecimal) followed by a CRLF (carriage return
and linefeed), and the data itself. The chunk is then closed with a CRLF.
In some implementations, white space chars (0x20) are padded between
chunk-size and the CRLF. In pkt-line format number of octet has fixed
width (4 hexadecimal digits, 0-padded), and we do not use CRLF as
terminator of chunk/packet length and of chunk/packet itself.
In HTTP 'chunked' transfer encoding the last chunk is a single line,
simply made of the chunk-size (0). In pkt-line format we use special
size of "0000" for a flush packet.
[1] http://en.wikipedia.org/wiki/Chunked_transfer_encoding
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2009-06-04 20:56 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-12 21:29 Request for detailed documentation of git pack protocol Jakub Narebski
2009-05-12 23:34 ` Shawn O. Pearce
2009-05-14 8:24 ` Jakub Narebski
2009-05-14 14:57 ` Shawn O. Pearce
2009-05-14 15:02 ` Andreas Ericsson
2009-05-15 20:29 ` Linus Torvalds
2009-05-15 16:51 ` Clemens Buchacher
2009-05-14 18:13 ` Nicolas Pitre
2009-05-14 20:27 ` Jakub Narebski
2009-05-14 13:55 ` Scott Chacon
2009-05-14 14:44 ` Shawn O. Pearce
2009-05-14 15:01 ` Jakub Narebski
2009-05-15 0:58 ` A Large Angry SCM
2009-05-15 19:05 ` Ealdwulf Wuffinga
2009-06-02 21:39 ` Jakub Narebski
2009-06-02 23:27 ` Shawn O. Pearce
2009-06-03 0:50 ` Jakub Narebski
2009-06-03 1:29 ` Shawn O. Pearce
2009-06-03 2:11 ` Junio C Hamano
2009-06-03 2:15 ` Shawn O. Pearce
2009-06-03 9:21 ` Jakub Narebski
2009-06-03 14:48 ` Shawn O. Pearce
2009-06-03 15:07 ` Shawn O. Pearce
2009-06-03 15:39 ` Jakub Narebski
2009-06-03 15:50 ` Shawn O. Pearce
2009-06-03 16:51 ` Jakub Narebski
2009-06-03 16:56 ` Shawn O. Pearce
2009-06-03 20:19 ` Jakub Narebski
2009-06-03 20:24 ` Shawn O. Pearce
2009-06-03 22:04 ` Jakub Narebski
2009-06-03 22:04 ` Shawn O. Pearce
2009-06-03 22:16 ` Junio C Hamano
2009-06-03 22:46 ` Jakub Narebski
2009-06-04 7:17 ` Andreas Ericsson
2009-06-04 7:26 ` Junio C Hamano
2009-06-06 16:33 ` Scott Chacon
2009-06-06 17:24 ` Junio C Hamano
2009-06-06 17:41 ` Jakub Narebski
2009-06-03 21:38 ` Tony Finch
2009-06-03 17:11 ` Junio C Hamano
2009-06-03 19:05 ` Johannes Sixt
2009-06-03 2:18 ` Robin H. Johnson
2009-06-03 10:47 ` Jakub Narebski
2009-06-03 14:17 ` Shawn O. Pearce
2009-06-03 20:56 ` Tony Finch
2009-06-03 21:20 ` Jakub Narebski
2009-06-03 21:53 ` Tony Finch
2009-06-04 8:45 ` Jakub Narebski
2009-06-04 11:41 ` Tony Finch
2009-06-04 18:41 ` Shawn O. Pearce
2009-06-03 12:29 ` Jakub Narebski
2009-06-03 14:19 ` Shawn O. Pearce
2009-06-04 20:55 ` Jakub Narebski [this message]
2009-06-04 21:57 ` Shawn O. Pearce
2009-06-05 0:45 ` Shawn O. Pearce
2009-06-05 7:24 ` Jakub Narebski
2009-06-05 8:45 ` Jakub Narebski
2009-06-06 21:38 ` Comments pack protocol description in "Git Community Book" (second round) Jakub Narebski
2009-06-06 21:58 ` Scott Chacon
2009-06-07 8:21 ` Jakub Narebski
2009-06-07 20:13 ` Shawn O. Pearce
2009-06-07 20:43 ` Shawn O. Pearce
2009-06-13 9:30 ` Comments pack protocol description in "RFC for the Git Packfile Protocol" (long) Jakub Narebski
2009-06-07 20:06 ` Comments pack protocol description in "Git Community Book" (second round) Shawn O. Pearce
2009-06-09 9:39 ` Jakub Narebski
2009-06-09 14:28 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200906042255.43952.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=schacon@gmail.com \
--cc=spearce@spearce.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.