* More on git over HTTP POST @ 2008-08-01 21:50 H. Peter Anvin 2008-08-02 20:57 ` Shawn O. Pearce 0 siblings, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-01 21:50 UTC (permalink / raw) To: Git Mailing List Hi all, I have investigated a bit what it would take to support git protocol (smart transport) over HTTP POST transactions. The current proxy system is broken, for a very simple reason: it doesn't convey information about when the channel should be turned around. HTTP POST -- or, for that matter, any RPC-style transport, is a half duplex transport: only one direction can be active at a time, after which the channel has to be explicitly turned around. The "turning around" consists of posting the queued transaction and listening for the reply. Ultimately, it comes down to the following: the transactor needs to be given explicit information when the git protocol goes from writing to reading (the opposite direction information is obvious.) I was hoping that it would be possible to get this information from snooping the protocol, but it doesn't seem to be so lucky. I started to hack on a variant which would embed a VFS-style interface in git itself, looking something like: struct transactor; struct transact_ops { ssize_t (*read)(struct transactor *, void *, size_t); ssize_t (*write)(struct transactor *, const void *, size_t); int (*close)(struct transactor *); }; struct transactor { union { void *p; intptr_t i; } u; const struct transact_ops *ops; }; Replacing the usual fd operations with this interface would allow a different transactor to see the phase changes explicitly; the replacement to use xread() and xwrite() is obvious. Of course, I started hacking on it and found myself with zero time to continue, but I thought I'd post what I had come up with. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-01 21:50 More on git over HTTP POST H. Peter Anvin @ 2008-08-02 20:57 ` Shawn O. Pearce 2008-08-02 21:00 ` Daniel Stenberg 2008-08-03 2:56 ` Shawn O. Pearce 0 siblings, 2 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-02 20:57 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Git Mailing List "H. Peter Anvin" <hpa@zytor.com> wrote: > I have investigated a bit what it would take to support git protocol > (smart transport) over HTTP POST transactions. I have started to think about this more myself, not just for POST put also for some form of GET that can return an efficient pack, rather than making the client walk the object chains itself. Have you looked at the Mecurial wire protocol? It runs over HTTP and uses a relatively efficient means of deciding where to cut the transfer at. http://www.selenic.com/mercurial/wiki/index.cgi/WireProtocol Most of their smarts are in the branches() and between() operations. Unfortunately this documentation isn't very complete and/or there are some simplifications that the Mecurial team took due to their repository format not initially supporting multiple branches like the Git format does. > The current proxy system is broken, for a very simple reason: it doesn't > convey information about when the channel should be turned around. Well, over git:// (or any protocol that wraps git:// like ssh) we assume a full-duplex channel. Some proxy systems are able to do such a channel. HTTP however does not offer it. > I started to hack on a variant which would embed a VFS-style interface > in git itself, looking something like: > > struct transactor; > > struct transact_ops { > ssize_t (*read)(struct transactor *, void *, size_t); > ssize_t (*write)(struct transactor *, const void *, size_t); > int (*close)(struct transactor *); > }; No, the git:// protocol implementation in fetch-pack/upload-pack runs more efficient than that by keeping a sliding window of stuff that is in-flight. Its I guess two async RPCs running in parallel, but from the client and server perspective both RPCs go into the same computation. HTTP POST is actually trivial if you don't want to support the new tell-me-more extension that was added to git-push. Hell, I could write the CGI in a few minutes I think. Its really just a small wrapper around git-receive-pack. What's a bitch is the efficient fetch, and getting tell-me-more to work on push. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-02 20:57 ` Shawn O. Pearce @ 2008-08-02 21:00 ` Daniel Stenberg 2008-08-02 21:08 ` Shawn O. Pearce 2008-08-03 2:56 ` Shawn O. Pearce 1 sibling, 1 reply; 42+ messages in thread From: Daniel Stenberg @ 2008-08-02 21:00 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List On Sat, 2 Aug 2008, Shawn O. Pearce wrote: > Well, over git:// (or any protocol that wraps git:// like ssh) we assume a > full-duplex channel. Some proxy systems are able to do such a channel. > HTTP however does not offer it. Yes it does. The CONNECT method is used to get a full-duplex channel to a remote site through a HTTP proxy. The downside with that is of course that most proxies are setup to disallow CONNECT to other ports than 443 (the https default port). -- / daniel.haxx.se ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-02 21:00 ` Daniel Stenberg @ 2008-08-02 21:08 ` Shawn O. Pearce 2008-08-02 21:23 ` Petr Baudis 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-02 21:08 UTC (permalink / raw) To: Daniel Stenberg; +Cc: Git Mailing List Daniel Stenberg <daniel@haxx.se> wrote: > On Sat, 2 Aug 2008, Shawn O. Pearce wrote: > >> Well, over git:// (or any protocol that wraps git:// like ssh) we >> assume a full-duplex channel. Some proxy systems are able to do such a >> channel. HTTP however does not offer it. > > Yes it does. The CONNECT method is used to get a full-duplex channel to a > remote site through a HTTP proxy. The downside with that is of course > that most proxies are setup to disallow CONNECT to other ports than 443 > (the https default port). Ah, yes. CONNECT. Very few servers wind up supporting it I think. I know one very big company who cannot use or support Git because Git over HTTP is too slow to be useful. They support other tools like Subversion instead. :-| Really we just need smart protocol support in half-duplex RPC like hpa was going after. Then it doesn't matter what we serialize it into, almost any RPC system will be useful. Of course the only one that probably matters in practice is HTTP. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-02 21:08 ` Shawn O. Pearce @ 2008-08-02 21:23 ` Petr Baudis 2008-08-02 21:32 ` Shawn O. Pearce 0 siblings, 1 reply; 42+ messages in thread From: Petr Baudis @ 2008-08-02 21:23 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Daniel Stenberg, Git Mailing List On Sat, Aug 02, 2008 at 02:08:28PM -0700, Shawn O. Pearce wrote: > I know one very big company who cannot use or support Git because > Git over HTTP is too slow to be useful. They support other tools > like Subversion instead. :-| On what projects? I'm currently using Git over HTTP (read-only) a lot and it doesn't seem really all that impractical to me. Maybe just using a more dumb-friendly packing scheme could help a lot? Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-02 21:23 ` Petr Baudis @ 2008-08-02 21:32 ` Shawn O. Pearce 0 siblings, 0 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-02 21:32 UTC (permalink / raw) To: Petr Baudis; +Cc: Daniel Stenberg, Git Mailing List Petr Baudis <pasky@suse.cz> wrote: > On Sat, Aug 02, 2008 at 02:08:28PM -0700, Shawn O. Pearce wrote: > > I know one very big company who cannot use or support Git because > > Git over HTTP is too slow to be useful. They support other tools > > like Subversion instead. :-| > > On what projects? I'm currently using Git over HTTP (read-only) a lot > and it doesn't seem really all that impractical to me. Maybe just using > a more dumb-friendly packing scheme could help a lot? They tested by taking the SVN source code and importing it into both Git and Hg, then cloned them both over a WAN link. Git was 22x slower. I suspect they didn't pack the Git repository at all, so Git had to issue thousands of HTTP GET requests for the loose objects. But I also suspect there was bias in the testing so they didn't realize they needed to repack, and didn't care to find out. I've probably already said too much. I'm under NDAs. But anyway. The point I was trying to make was that there are not just some proxy servers, but also some server platforms, that cannot handle bidirectional communiction. E.g. servers that are behind reverse proxies, where the reverse proxy is acting as a sort of firewall or content cache accelerator. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-02 20:57 ` Shawn O. Pearce 2008-08-02 21:00 ` Daniel Stenberg @ 2008-08-03 2:56 ` Shawn O. Pearce 2008-08-03 3:27 ` Junio C Hamano ` (4 more replies) 1 sibling, 5 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 2:56 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Git Mailing List "Shawn O. Pearce" <spearce@spearce.org> wrote: > "H. Peter Anvin" <hpa@zytor.com> wrote: > > I have investigated a bit what it would take to support git protocol > > (smart transport) over HTTP POST transactions. > > I have started to think about this more myself, not just for POST > put also for some form of GET that can return an efficient pack, > rather than making the client walk the object chains itself. ... > HTTP POST is actually trivial if you don't want to support the new > tell-me-more extension that was added to git-push. Hell, I could > write the CGI in a few minutes I think. Its really just a small > wrapper around git-receive-pack. So I have this draft of how smart push might work. Its slated for the Documentation/technical directory. Thus far I have only written about push support, but Ilari on #git has some ideas about how to do a smart fetch protocol. Implementation wise in C git I think this is just a new C program (git-http-backend?) that turns around and proxies into git-receive-pack, at least for the push support. What I don't know is how we could configure URI translation from /path/to/repository.git received out of the $PATH_INFO in the CGI environment to a physical directory. Should we rely on the server's $PATH_TRANSLATED? Smart HTTP transfer protocols ============================= Git supports two HTTP based transfer protocols. A "dumb" protocol which requires only a standard HTTP server on the server end of the connection, and a "smart" protocol which requires a Git aware CGI (or server module). This document describes the "smart" protocol. Authentication -------------- Standard HTTP authentication is used, and must be configured and enforced by the HTTP server software. Chunked Transfer Encoding ------------------------- For performance reasons the HTTP/1.1 chunked transfer encoding is used frequently to transfer variable length objects. This avoids needing to produce large results in memory to compute the proper content-length. Detecting Smart Servers ----------------------- HTTP clients can detect a smart Git-aware server by sending the show-ref request (below) to the server. If the response has a status of 200 and the magic x-application/git-refs content type then the server can be assumed to be a smart Git-aware server. If any other response is received the client must assume dumb protocol support, as the server did not correctly response to the request. Show Refs --------- Obtains the available refs from the remote repository. The response is a sequence of git "packet lines", one per ref, and a final flush packet line to indicate the end of stream. C: GET /path/to/repository.git?show-ref HTTP/1.0 S: HTTP/1.1 200 OK S: Content-Type: x-application/git-refs S: Transfer-Encoding: chunked S: S: 62 S: 003e95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint S: S: 63 S: 003fd049f6c27a2244e12041955e262a404c7faba355 refs/heads/master S: S: 59 S: 003b2cb58b79488a98d2721cea644875a8dd0026b115 refs/heads/pu S: S: 4 S: 0000 S: 0 Push Pack --------- Uploads a pack and updates refs. The start of the stream is the commands to update the refs and the remainder of the stream is the pack file itself. See git-receive-pack and its network protocol in pack-protocol.txt, as this is essentially the same. C: POST /path/to/repository.git?receive-pack HTTP/1.0 C: Content-Type: x-application/git-receive-pack C: Transfer-Encoding: chunked C: C: 103 C: 006395dcfa3633004da0049d3d0fa03f80589cbcaf31 d049f6c27a2244e12041955e262a404c7faba355 refs/heads/maint C: 4 C: 0000 C: 12 C: PACK ... C: 0 S: HTTP/1.0 200 OK S: Content-type: x-application/git-status S: Transfer-Encoding: chunked S: S: ...<output of receive-pack>... -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 2:56 ` Shawn O. Pearce @ 2008-08-03 3:27 ` Junio C Hamano 2008-08-03 3:31 ` Shawn O. Pearce 2008-08-03 3:47 ` H. Peter Anvin 2008-08-03 3:51 ` H. Peter Anvin ` (3 subsequent siblings) 4 siblings, 2 replies; 42+ messages in thread From: Junio C Hamano @ 2008-08-03 3:27 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: H. Peter Anvin, Git Mailing List "Shawn O. Pearce" <spearce@spearce.org> writes: > Show Refs > --------- > > Obtains the available refs from the remote repository. The response > is a sequence of git "packet lines", one per ref, and a final flush > packet line to indicate the end of stream. As the initial protocol exchange request, I suspect that you would regret if you do not leave room for some "capability advertisement" in this exchange. With the git native protocol, we luckily found space to do so after the ref payload (because pkt-line is "length + payload" format but the code that reads payload happened to ignore anything after NUL). You would want to define how these are given by the server to the client over HTTP channel. For example, putting them on extra HTTP headers is probably Ok. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 3:27 ` Junio C Hamano @ 2008-08-03 3:31 ` Shawn O. Pearce 2008-08-03 3:47 ` H. Peter Anvin 1 sibling, 0 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 3:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: H. Peter Anvin, Git Mailing List Junio C Hamano <gitster@pobox.com> wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > > Show Refs > > --------- > > > > Obtains the available refs from the remote repository. The response > > is a sequence of git "packet lines", one per ref, and a final flush > > packet line to indicate the end of stream. > > As the initial protocol exchange request, I suspect that you would regret > if you do not leave room for some "capability advertisement" in this > exchange. > > With the git native protocol, we luckily found space to do so after the > ref payload (because pkt-line is "length + payload" format but the code > that reads payload happened to ignore anything after NUL). You would want > to define how these are given by the server to the client over HTTP > channel. For example, putting them on extra HTTP headers is probably Ok. Yea, I thought that the HTTP headers would be more than enough space to add capability advertisements. Most client libraries will happily parse and store these for the application, and won't make a fuss if the application doesn't read them. Hence there's more than enough room in the protocol to extend it in the future with additional capabilities. We do have to be careful though. Any cachable resource must only rely upon the URI and the standard headers which compute into the cache key for a request. There aren't many, though I think the Content-Type header may be among them. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 3:27 ` Junio C Hamano 2008-08-03 3:31 ` Shawn O. Pearce @ 2008-08-03 3:47 ` H. Peter Anvin 2008-08-03 4:10 ` Shawn O. Pearce 1 sibling, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 3:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, Git Mailing List Junio C Hamano wrote: > With the git native protocol, we luckily found space to do so after the > ref payload (because pkt-line is "length + payload" format but the code > that reads payload happened to ignore anything after NUL). You would want > to define how these are given by the server to the client over HTTP > channel. For example, putting them on extra HTTP headers is probably Ok. I think that would be a mistake, just because it's one more thing for proxies to screw up on. It's better to have negotiation information in the payload, before the "real" data. Obviously one thing that needs to be included in each transaction is a transaction ID that will be reported back on the next transaction, since you can't rely on a persistent connection. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 3:47 ` H. Peter Anvin @ 2008-08-03 4:10 ` Shawn O. Pearce 2008-08-03 8:10 ` david 2008-08-03 11:29 ` H. Peter Anvin 0 siblings, 2 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 4:10 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Junio C Hamano, Git Mailing List "H. Peter Anvin" <hpa@zytor.com> wrote: > Junio C Hamano wrote: >> For example, putting them [capabilities] on extra HTTP headers is probably Ok. > > I think that would be a mistake, just because it's one more thing for > proxies to screw up on. I didn't realize we were in an era of proxies that are that brain-damaged that they cannot relay the other headers. The Amazon S3 service relies heavily upon their own extended headers to make their REST API work. If proxies stripped that stuff out then the client wouldn't work at all. IOW I had thought we were past this dark age of the Internet. > It's better to have negotiation information in > the payload, before the "real" data. I guess I could do that. At least for the really complex stuff. > Obviously one thing that needs to be included in each transaction is a > transaction ID that will be reported back on the next transaction, since > you can't rely on a persistent connection. No. That requires the server to maintain state. We don't want to do that if we can avoid it. I would much rather have the clients handle the state management as it simplifies the server side, especially when you start talking about reverse proxies and/or load-balancers running in front of the server farm. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 4:10 ` Shawn O. Pearce @ 2008-08-03 8:10 ` david 2008-08-03 11:42 ` H. Peter Anvin 2008-08-03 11:29 ` H. Peter Anvin 1 sibling, 1 reply; 42+ messages in thread From: david @ 2008-08-03 8:10 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: H. Peter Anvin, Junio C Hamano, Git Mailing List On Sat, 2 Aug 2008, Shawn O. Pearce wrote: > > "H. Peter Anvin" <hpa@zytor.com> wrote: >> Junio C Hamano wrote: >>> For example, putting them [capabilities] on extra HTTP headers is probably Ok. >> >> I think that would be a mistake, just because it's one more thing for >> proxies to screw up on. > > I didn't realize we were in an era of proxies that are that > brain-damaged that they cannot relay the other headers. The Amazon > S3 service relies heavily upon their own extended headers to make > their REST API work. If proxies stripped that stuff out then the > client wouldn't work at all. > > IOW I had thought we were past this dark age of the Internet. actually, it's not just a matter of not getting 'past this dark age of the Internet', it's an issue that so many people are tunneling _everyting_ over http (including the bad guys tunneling malware) that proxies are getting more aggressive then they have ever been before in pulling apart the payload and analysing it before letting it get through to the far side. David Lang >> It's better to have negotiation information in >> the payload, before the "real" data. > > I guess I could do that. At least for the really complex stuff. > >> Obviously one thing that needs to be included in each transaction is a >> transaction ID that will be reported back on the next transaction, since >> you can't rely on a persistent connection. > > No. That requires the server to maintain state. We don't want to > do that if we can avoid it. I would much rather have the clients > handle the state management as it simplifies the server side, > especially when you start talking about reverse proxies and/or > load-balancers running in front of the server farm. > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 8:10 ` david @ 2008-08-03 11:42 ` H. Peter Anvin 0 siblings, 0 replies; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 11:42 UTC (permalink / raw) To: david; +Cc: Shawn O. Pearce, Junio C Hamano, Git Mailing List david@lang.hm wrote: > > actually, it's not just a matter of not getting 'past this dark age of > the Internet', it's an issue that so many people are tunneling > _everyting_ over http (including the bad guys tunneling malware) that > proxies are getting more aggressive then they have ever been before in > pulling apart the payload and analysing it before letting it get through > to the far side. > ... which is of course because of said proxies that this is happening, too. There are too many idiots out there building "security software" and running IT departments, that's really the bottom line. By the way, I want to say *thank you* to Shawn for tackling this project: this has been a major issue for kernel.org, and getting something like this deployed would be incredibly helpful. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 4:10 ` Shawn O. Pearce 2008-08-03 8:10 ` david @ 2008-08-03 11:29 ` H. Peter Anvin 1 sibling, 0 replies; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 11:29 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, Git Mailing List Shawn O. Pearce wrote: > > IOW I had thought we were past this dark age of the Internet. > If we were, there wouldn't be a need for this project at all. The whole purpose of it is to deal with corporate proxies that try to prevent actual communication because of "security", and it's really hard to predict what utterly arbitrary heuristics they have applied. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 2:56 ` Shawn O. Pearce 2008-08-03 3:27 ` Junio C Hamano @ 2008-08-03 3:51 ` H. Peter Anvin 2008-08-03 4:12 ` Shawn O. Pearce 2008-08-03 4:01 ` H. Peter Anvin ` (2 subsequent siblings) 4 siblings, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 3:51 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List Shawn O. Pearce wrote: > Chunked Transfer Encoding > ------------------------- > > For performance reasons the HTTP/1.1 chunked transfer encoding is > used frequently to transfer variable length objects. This avoids > needing to produce large results in memory to compute the proper > content-length. Note: you cannot rely on HTTP/1.1 being supported by an intermediate proxy; you might have to handle HTTP/1.0, where the data is terminated by connection close. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 3:51 ` H. Peter Anvin @ 2008-08-03 4:12 ` Shawn O. Pearce 2008-08-03 11:31 ` H. Peter Anvin 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 4:12 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Git Mailing List "H. Peter Anvin" <hpa@zytor.com> wrote: > Shawn O. Pearce wrote: >> Chunked Transfer Encoding >> ------------------------- >> >> For performance reasons the HTTP/1.1 chunked transfer encoding is >> used frequently to transfer variable length objects. This avoids >> needing to produce large results in memory to compute the proper >> content-length. > > Note: you cannot rely on HTTP/1.1 being supported by an intermediate > proxy; you might have to handle HTTP/1.0, where the data is terminated > by connection close. Well, that proxy is going to be crying when we upload a 120M pack during a push to it, and it buffers the damn thing to figure out the proper Content-Length so it can convert an HTTP/1.1 client request into an HTTP/1.0 request to forward to the server. That's just _stupid_. But from the client side perspective the chunked transfer encoding is used only to avoid generating in advance and producing the content-length header. I fully expect the encoding to disappear (e.g. in a proxy, or in the HTTP client library) before any sort of Git code gets its fingers on the data. Hence to your other remark, I _do not_ rely upon the encoding boundaries to remain intact. That is why there is Git pkt-line encodings inside of the HTTP data stream. We can rely on the pkt-line encoding being present, even if the HTTP chunks were moved around (or removed entirely) by a proxy. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 4:12 ` Shawn O. Pearce @ 2008-08-03 11:31 ` H. Peter Anvin 0 siblings, 0 replies; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 11:31 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List Shawn O. Pearce wrote: > > But from the client side perspective the chunked transfer encoding > is used only to avoid generating in advance and producing the > content-length header. I fully expect the encoding to disappear > (e.g. in a proxy, or in the HTTP client library) before any sort > of Git code gets its fingers on the data. > > Hence to your other remark, I _do not_ rely upon the encoding > boundaries to remain intact. That is why there is Git pkt-line > encodings inside of the HTTP data stream. We can rely on the > pkt-line encoding being present, even if the HTTP chunks were > moved around (or removed entirely) by a proxy. > Excellent. I did not mean that as criticism, obviously, I just wanted that to be clear. HTTP/1.1 does chunked encoding, and HTTP/1.0 does terminate on connection close; both serve the same purpose. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 2:56 ` Shawn O. Pearce 2008-08-03 3:27 ` Junio C Hamano 2008-08-03 3:51 ` H. Peter Anvin @ 2008-08-03 4:01 ` H. Peter Anvin 2008-08-03 6:43 ` Mike Hommey 2008-08-03 7:25 ` [RFC 1/2] Add backdoor options to receive-pack for use in Git-aware CGI Shawn O. Pearce 4 siblings, 0 replies; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 4:01 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List Shawn O. Pearce wrote: > Chunked Transfer Encoding > ------------------------- > > For performance reasons the HTTP/1.1 chunked transfer encoding is > used frequently to transfer variable length objects. This avoids > needing to produce large results in memory to compute the proper > content-length. One more thing about chunked transfer encodings: you cannot assume that a proxy will maintain chunk boundaries, any more than you can assume that a firewall will maintain TCP packet boundaries. > Detecting Smart Servers > ----------------------- > > HTTP clients can detect a smart Git-aware server by sending the > show-ref request (below) to the server. If the response has a > status of 200 and the magic x-application/git-refs content type > then the server can be assumed to be a smart Git-aware server. > > If any other response is received the client must assume dumb > protocol support, as the server did not correctly response to > the request. I think it should be application/x-git-refs, but that's splitting hairs. > Obtains the available refs from the remote repository. The response > is a sequence of git "packet lines", one per ref, and a final flush > packet line to indicate the end of stream. > > C: GET /path/to/repository.git?show-ref HTTP/1.0 > I really think it would make more sense to use POST requests for everything, and have the command part of the POSTed payload. Putting stuff in the URL just complicates the namespace to the detriment of the admin. > S: HTTP/1.1 200 OK > S: Content-Type: x-application/git-refs > S: Transfer-Encoding: chunked Transfer-encoding: chunked is illegal with a HTTP/1.0 client. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: More on git over HTTP POST 2008-08-03 2:56 ` Shawn O. Pearce ` (2 preceding siblings ...) 2008-08-03 4:01 ` H. Peter Anvin @ 2008-08-03 6:43 ` Mike Hommey 2008-08-03 7:25 ` [RFC 1/2] Add backdoor options to receive-pack for use in Git-aware CGI Shawn O. Pearce 4 siblings, 0 replies; 42+ messages in thread From: Mike Hommey @ 2008-08-03 6:43 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List On Sat, Aug 02, 2008 at 07:56:02PM -0700, Shawn O. Pearce wrote: > Smart HTTP transfer protocols > ============================= > > Git supports two HTTP based transfer protocols. A "dumb" protocol > which requires only a standard HTTP server on the server end of the > connection, and a "smart" protocol which requires a Git aware CGI > (or server module). This document describes the "smart" protocol. If you want, I have a patch series that introduces a small API to make HTTP requests easier to make. Mike ^ permalink raw reply [flat|nested] 42+ messages in thread
* [RFC 1/2] Add backdoor options to receive-pack for use in Git-aware CGI 2008-08-03 2:56 ` Shawn O. Pearce ` (3 preceding siblings ...) 2008-08-03 6:43 ` Mike Hommey @ 2008-08-03 7:25 ` Shawn O. Pearce 2008-08-03 7:25 ` [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport Shawn O. Pearce 4 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 7:25 UTC (permalink / raw) To: git, H. Peter Anvin The new --report-status flag forces the status report feature of the push protocol to be enabled. This can be useful in a CGI program that implements the server side of a "smart" Git-aware HTTP transport. The CGI code can perform the selection of the feature and ask receive-pack to enable it automatically. The new --no-advertise-heads causes receive-pack to bypass its usual display of known refs to the client, and instead immediately start reading the commands and pack from stdin. This is useful in a CGI situation where we want to hand off all input to receive-pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- receive-pack.c | 19 ++++++++++++++----- 1 files changed, 14 insertions(+), 5 deletions(-) diff --git a/receive-pack.c b/receive-pack.c index d44c19e..512eae6 100644 --- a/receive-pack.c +++ b/receive-pack.c @@ -464,6 +464,7 @@ static int delete_only(struct command *cmd) int main(int argc, char **argv) { + int advertise_heads = 1; int i; char *dir = NULL; @@ -472,7 +473,15 @@ int main(int argc, char **argv) char *arg = *argv++; if (*arg == '-') { - /* Do flag handling here */ + if (!strcmp(arg, "--report-status")) { + report_status = 1; + continue; + } + if (!strcmp(arg, "--no-advertise-heads")) { + advertise_heads = 0; + continue; + } + usage(receive_pack_usage); } if (dir) @@ -497,10 +506,10 @@ int main(int argc, char **argv) else if (0 <= receive_unpack_limit) unpack_limit = receive_unpack_limit; - write_head_info(); - - /* EOF */ - packet_flush(1); + if (advertise_heads) { + write_head_info(); + packet_flush(1); + } read_head_info(); if (commands) { -- 1.6.0.rc1.221.g9ae23 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-03 7:25 ` [RFC 1/2] Add backdoor options to receive-pack for use in Git-aware CGI Shawn O. Pearce @ 2008-08-03 7:25 ` Shawn O. Pearce 2008-08-03 11:38 ` H. Peter Anvin 2008-08-03 22:16 ` Junio C Hamano 0 siblings, 2 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 7:25 UTC (permalink / raw) To: git, H. Peter Anvin This CGI can be loaded into an Apache server using ScriptAlias, such as with the following configuration: LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so LoadModule alias_module /usr/libexec/apache2/mod_alias.so ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/ Repositories are accessed via the translated PATH_INFO. The CGI is backwards compatible with the dumb client, allowing the client to detect the server's smarts by looking at the content-type returned from "GET /repo.git/info/refs". If the returned content type is the magic application/x-git-refs type then the client can assume the server is Git-aware. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- .gitignore | 1 + Documentation/technical/http-protocol.txt | 88 +++++++++ Makefile | 1 + http-backend.c | 302 +++++++++++++++++++++++++++++ 4 files changed, 392 insertions(+), 0 deletions(-) create mode 100644 Documentation/technical/http-protocol.txt create mode 100644 http-backend.c diff --git a/.gitignore b/.gitignore index a213e8e..02eaf3a 100644 --- a/.gitignore +++ b/.gitignore @@ -51,6 +51,7 @@ git-gc git-get-tar-commit-id git-grep git-hash-object +git-http-backend git-http-fetch git-http-push git-imap-send diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt new file mode 100644 index 0000000..6cb96f3 --- /dev/null +++ b/Documentation/technical/http-protocol.txt @@ -0,0 +1,88 @@ +Smart HTTP transfer protocols +============================= + +Git supports two HTTP based transfer protocols. A "dumb" protocol +which requires only a standard HTTP server on the server end of the +connection, and a "smart" protocol which requires a Git aware CGI +(or server module). This document describes the "smart" protocol. + +As a design feature smart servers automatically degrade to the +dumb protocol when speaking with a dumb client. This may cause +more load to be placed on the server as the file GET requests are +handled by a CGI rather than the server itself. + + +Authentication +-------------- + +Standard HTTP authentication is used, and must be configured and +enforced by the HTTP server software. + +Chunked Transfer Encoding +------------------------- + +For performance reasons the HTTP/1.1 chunked transfer encoding is +used frequently to transfer variable length objects. This avoids +needing to produce large results in memory to compute the proper +content-length. + +Detecting Smart Servers +----------------------- + +HTTP clients can detect a smart Git-aware server by sending the +/info/refs request (below) to the server. If the response has a +status of 200 and the magic application/x-git-refs content type +then the server can be assumed to be a smart Git-aware server. + + +Show Refs +--------- + +Obtains the available refs from the remote repository. The response +is a sequence of refs, one per line. The actual format matches that +of the $GIT_DIR/info/refs file normally used by a "dumb" protocol. + + C: GET /path/to/repository.git/info/refs HTTP/1.0 + + S: HTTP/1.1 200 OK + S: Content-Type: application/x-git-refs + S: Transfer-Encoding: chunked + S: + S: 62 + S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint + S: + S: 63 + S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master + S: + S: 59 + S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/heads/pu + S: + +Push Pack +--------- + +Uploads a pack and updates refs. The start of the stream is the +commands to update the refs and the remainder of the stream is the +pack file itself. See git-receive-pack and its network protocol +in pack-protocol.txt, as this is essentially the same. + + C: POST /path/to/repository.git/receive-pack HTTP/1.0 + C: Content-Type: application/x-git-receive-pack + C: Transfer-Encoding: chunked + C: + C: 103 + C: 006395dcfa3633004da0049d3d0fa03f80589cbcaf31 d049f6c27a2244e12041955e262a404c7faba355 refs/heads/maint + C: 4 + C: 0000 + C: 12 + C: PACK + ... + C: 0 + + S: HTTP/1.0 200 OK + S: Content-type: application/x-git-receive-pack-status + S: Transfer-Encoding: chunked + S: + S: ...<output of receive-pack>... + + diff --git a/Makefile b/Makefile index 52c67c1..3a93bf6 100644 --- a/Makefile +++ b/Makefile @@ -298,6 +298,7 @@ PROGRAMS += git-unpack-file$X PROGRAMS += git-update-server-info$X PROGRAMS += git-upload-pack$X PROGRAMS += git-var$X +PROGRAMS += git-http-backend$X # List built-in command $C whose implementation cmd_$C() is not in # builtin-$C.o but is linked in as part of some other command. diff --git a/http-backend.c b/http-backend.c new file mode 100644 index 0000000..a498f89 --- /dev/null +++ b/http-backend.c @@ -0,0 +1,302 @@ +#include "cache.h" +#include "refs.h" +#include "pkt-line.h" +#include "object.h" +#include "tag.h" +#include "exec_cmd.h" +#include "run-command.h" + +static const char content_type[] = "Content-Type"; +static const char content_length[] = "Content-Length"; + +static int can_chunk; +static char buffer[1000]; + +static void send_status(unsigned code, const char *msg) +{ + size_t n; + + n = snprintf(buffer, sizeof(buffer), "Status: %u %s\r\n", code, msg); + if (n >= sizeof(buffer)) + die("protocol error: impossibly long header"); + safe_write(1, buffer, n); +} + +static void send_header(const char *name, const char *value) +{ + size_t n; + + n = snprintf(buffer, sizeof(buffer), "%s: %s\r\n", name, value); + if (n >= sizeof(buffer)) + die("protocol error: impossibly long header"); + safe_write(1, buffer, n); +} + +static void end_headers(void) +{ + safe_write(1, "\r\n", 2); +} + +static void send_nocaching(void) +{ + const char *proto = getenv("SERVER_PROTOCOL"); + if (!proto || !strcmp(proto, "HTTP/1.0")) + send_header("Expires", "Mon, 17 Sep 2001 00:00:00 GMT"); + else + send_header("Cache-Control", "no-cache"); +} + +static void send_connection_close(void) +{ + send_header("Connection", "close"); +} + +static void enable_chunking(void) +{ + const char *proto = getenv("SERVER_PROTOCOL"); + + can_chunk = proto && strcmp(proto, "HTTP/1.0"); + if (can_chunk) + send_header("Transfer-Encoding", "chunked"); + else + send_connection_close(); +} + +#define hex(a) (hexchar[(a) & 15]) +static void chunked_write(const char *fmt, ...) +{ + static const char hexchar[] = "0123456789abcdef"; + va_list args; + unsigned n; + + va_start(args, fmt); + n = vsnprintf(buffer + 6, sizeof(buffer) - 8, fmt, args); + va_end(args); + if (n >= sizeof(buffer) - 8) + die("protocol error: impossibly long line"); + + if (can_chunk) { + unsigned len = n + 4, b = 4; + + buffer[4] = '\r'; + buffer[5] = '\n'; + buffer[n + 6] = '\r'; + buffer[n + 7] = '\n'; + + while (n > 0) { + buffer[--b] = hex(n); + n >>= 4; + len++; + } + + safe_write(1, buffer + b, len); + } else + safe_write(1, buffer + 6, n); +} + +static void end_chunking(void) +{ + static const char flush_chunk[] = "0\r\n\r\n"; + if (can_chunk) + safe_write(1, flush_chunk, strlen(flush_chunk)); +} + +static void NORETURN invalid_request(const char *msg) +{ + static const char header[] = "error: "; + + send_status(400, "Bad Request"); + send_header(content_type, "text/plain"); + end_headers(); + + safe_write(1, header, strlen(header)); + safe_write(1, msg, strlen(msg)); + safe_write(1, "\n", 1); + + exit(0); +} + +static void not_found(void) +{ + send_status(404, "Not Found"); + end_headers(); +} + +static void server_error(void) +{ + send_status(500, "Internal Error"); + end_headers(); +} + +static void require_content_type(const char *need_type) +{ + const char *input_type = getenv("CONTENT_TYPE"); + if (!input_type || strcmp(input_type, need_type)) + invalid_request("Unsupported content-type"); +} + +static void do_GET_any_file(char *name) +{ + const char *p = git_path("%s", name); + struct stat sb; + uintmax_t remaining; + size_t n; + int fd = open(p, O_RDONLY); + + if (fd < 0) { + not_found(); + return; + } + if (fstat(fd, &sb) < 0) { + close(fd); + server_error(); + die("fstat on plain file failed"); + } + remaining = (uintmax_t)sb.st_size; + + n = snprintf(buffer, sizeof(buffer), + "Content-Length: %" PRIuMAX "\r\n", remaining); + if (n >= sizeof(buffer)) + die("protocol error: impossibly long header"); + safe_write(1, buffer, n); + send_header(content_type, "application/octet-stream"); + end_headers(); + + while (remaining) { + n = xread(fd, buffer, sizeof(buffer)); + if (n < 0) + die("error reading from %s", p); + n = safe_write(1, buffer, n); + if (n <= 0) + break; + } + close(fd); +} + +static int show_one_ref(const char *name, const unsigned char *sha1, + int flag, void *cb_data) +{ + struct object *o = parse_object(sha1); + if (!o) + return 0; + + chunked_write("%s\t%s\n", sha1_to_hex(sha1), name); + if (o->type == OBJ_TAG) { + o = deref_tag(o, name, 0); + if (!o) + return 0; + chunked_write("%s\t%s^{}\n", sha1_to_hex(o->sha1), name); + } + + return 0; +} + +static void do_GET_info_refs(char *arg) +{ + send_header(content_type, "application/x-git-refs"); + send_nocaching(); + enable_chunking(); + end_headers(); + + for_each_ref(show_one_ref, NULL); + end_chunking(); +} + +static void do_GET_info_packs(char *arg) +{ + size_t objdirlen = strlen(get_object_directory()); + struct packed_git *p; + + send_nocaching(); + enable_chunking(); + end_headers(); + + prepare_packed_git(); + for (p = packed_git; p; p = p->next) { + if (!p->pack_local) + continue; + chunked_write("P %s\n", p->pack_name + objdirlen + 6); + } + chunked_write("\n"); + end_chunking(); +} + +static void do_POST_receive_pack(char *arg) +{ + require_content_type("application/x-git-receive-pack"); + send_header(content_type, "application/x-git-receive-pack-status"); + send_nocaching(); + send_connection_close(); + end_headers(); + + execl_git_cmd("receive-pack", + "--report-status", + "--no-advertise-heads", + ".", + NULL); + die("Failed to start receive-pack"); +} + +static struct service_cmd { + const char *method; + const char *pattern; + void (*imp)(char *); +} services[] = { + {"GET", "/info/refs$", do_GET_info_refs}, + {"GET", "/objects/info/packs", do_GET_info_packs}, + + {"GET", "/HEAD$", do_GET_any_file}, + {"GET", "/objects/../.{38}$", do_GET_any_file}, + {"GET", "/objects/pack/pack-[^/]*$", do_GET_any_file}, + {"GET", "/objects/info/[^/]*$", do_GET_any_file}, + + {"POST", "/receive-pack", do_POST_receive_pack} +}; + +int main(int argc, char **argv) +{ + char *input_method = getenv("REQUEST_METHOD"); + char *dir = getenv("PATH_TRANSLATED"); + struct service_cmd *cmd = NULL; + char *cmd_arg = NULL; + int i; + + if (!input_method) + die("No REQUEST_METHOD from server"); + if (!strcmp(input_method, "HEAD")) + input_method = "GET"; + + if (!dir) + die("No PATH_TRANSLATED from server"); + + for (i = 0; i < ARRAY_SIZE(services); i++) { + struct service_cmd *c = &services[i]; + regex_t re; + regmatch_t out[1]; + + if (strcmp(input_method, c->method)) + continue; + if (regcomp(&re, c->pattern, REG_EXTENDED)) + die("Bogus re in service table: %s", c->pattern); + if (!regexec(&re, dir, 2, out, 0)) { + size_t n = out[0].rm_eo - out[0].rm_so; + cmd = c; + cmd_arg = xmalloc(n); + strncpy(cmd_arg, dir + out[0].rm_so + 1, n); + cmd_arg[n] = 0; + dir[out[0].rm_so] = 0; + break; + } + regfree(&re); + } + + if (!cmd) + invalid_request("Unsupported query request"); + + setup_path(); + if (!enter_repo(dir, 0)) + invalid_request("Not a Git repository"); + + cmd->imp(cmd_arg); + return 0; +} -- 1.6.0.rc1.221.g9ae23 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-03 7:25 ` [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport Shawn O. Pearce @ 2008-08-03 11:38 ` H. Peter Anvin 2008-08-03 21:25 ` Shawn O. Pearce 2008-08-03 22:16 ` Junio C Hamano 1 sibling, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-03 11:38 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git Shawn O. Pearce wrote: > +#define hex(a) (hexchar[(a) & 15]) > +static void chunked_write(const char *fmt, ...) > +{ > + static const char hexchar[] = "0123456789abcdef"; > + va_list args; > + unsigned n; > + > + va_start(args, fmt); > + n = vsnprintf(buffer + 6, sizeof(buffer) - 8, fmt, args); > + va_end(args); > + if (n >= sizeof(buffer) - 8) > + die("protocol error: impossibly long line"); > + > + if (can_chunk) { > + unsigned len = n + 4, b = 4; > + > + buffer[4] = '\r'; > + buffer[5] = '\n'; > + buffer[n + 6] = '\r'; > + buffer[n + 7] = '\n'; > + > + while (n > 0) { > + buffer[--b] = hex(n); > + n >>= 4; > + len++; > + } > + > + safe_write(1, buffer + b, len); > + } else > + safe_write(1, buffer + 6, n); > +} Maybe I am slightly confused, but I thought handling HTTP chunking for HTTP/1.1+ clients was usually done by Apache above the level of the CGI script? -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-03 11:38 ` H. Peter Anvin @ 2008-08-03 21:25 ` Shawn O. Pearce 0 siblings, 0 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-03 21:25 UTC (permalink / raw) To: H. Peter Anvin; +Cc: git "H. Peter Anvin" <hpa@zytor.com> wrote: > Shawn O. Pearce wrote: >> +#define hex(a) (hexchar[(a) & 15]) >> +static void chunked_write(const char *fmt, ...) >> +{ > > Maybe I am slightly confused, but I thought handling HTTP chunking for > HTTP/1.1+ clients was usually done by Apache above the level of the CGI > script? You may be right. Apache undoes the chunking during a POST before feeding the data to the CGI script. If we can omit this mess of code from git-http-backend that's a good thing. Thanks for the sanity check. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-03 7:25 ` [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport Shawn O. Pearce 2008-08-03 11:38 ` H. Peter Anvin @ 2008-08-03 22:16 ` Junio C Hamano 2008-08-04 3:59 ` Shawn O. Pearce 1 sibling, 1 reply; 42+ messages in thread From: Junio C Hamano @ 2008-08-03 22:16 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, H. Peter Anvin I very much like it. But could you be a bit more explicit than application/x-git-refs magic? I suspect very strongly that clueless server operators would advertise the type on repositories statically hosted there, and would defeat the point of your patch. We are not changing update-server-info so if we can find a place we can use to hide the "magic", it would be a much more robust. Perhaps "#" comment line in info/refs that is ignored on the reading side but update-server-info never generates on its own? Or perhaps sort the output differently from how update-server-info produces its output, so that older client would not care but the magic aware client can notice? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-03 22:16 ` Junio C Hamano @ 2008-08-04 3:59 ` Shawn O. Pearce 2008-08-04 9:53 ` Rogan Dawes 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-04 3:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, H. Peter Anvin Junio C Hamano <gitster@pobox.com> wrote: > But could you be a bit more explicit than application/x-git-refs magic? I > suspect very strongly that clueless server operators would advertise the > type on repositories statically hosted there, and would defeat the point > of your patch. This is a very valid concern. I started to worry about it myself last night, but decided it was late enough and just wanted to start the discussion on the list, extending JH's thread even further. > Perhaps "#" comment line in info/refs that is ignored on the reading side > but update-server-info never generates on its own? This is a good idea. I think anyone who consumes info/refs does so with the understanding that "#" comment lines exist, and should be skipped, but this is not something that has been heavily tested in the wild yet. My concern here goes back to the remark you made above. What if a server owner mirrors a smart server by a non-Git aware device like wget? They will now have a copy of the info/refs content which will suggest we have Git smarts on the backend, but really it isn't there. Perhaps the smart server detection is something like: Smart Server Detection ---------------------- To detect a smart (Git-aware) server a client sends an empty POST request to info/refs; if a 200 OK response is received with the proper content type then the server can be assumed to be Git-aware, and the result contains the current info/refs data for that repository. C: POST /repository.git/info/refs HTTP/1.0 C: Content-Length: 0 S: HTTP/1.0 200 OK S: Content-Type: application/x-git-refs S: S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint Then clients should just attempt this POST first before issuing a GET info/refs. Non Git-aware servers will issue an error code, and the client can retry with a standard GET request, and assume the server isn't a newer style. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 3:59 ` Shawn O. Pearce @ 2008-08-04 9:53 ` Rogan Dawes 2008-08-04 10:08 ` Johannes Schindelin 2008-08-04 14:48 ` Shawn O. Pearce 0 siblings, 2 replies; 42+ messages in thread From: Rogan Dawes @ 2008-08-04 9:53 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, git, H. Peter Anvin Shawn O. Pearce wrote: > Perhaps the smart server detection is something like: > > Smart Server Detection > ---------------------- > > To detect a smart (Git-aware) server a client sends an > empty POST request to info/refs; if a 200 OK response is > received with the proper content type then the server can > be assumed to be Git-aware, and the result contains the > current info/refs data for that repository. > > C: POST /repository.git/info/refs HTTP/1.0 > C: Content-Length: 0 > > S: HTTP/1.0 200 OK > S: Content-Type: application/x-git-refs > S: > S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint > > Then clients should just attempt this POST first before issuing > a GET info/refs. Non Git-aware servers will issue an error code, > and the client can retry with a standard GET request, and assume > the server isn't a newer style. > I don't understand why you would want to keep the commands in the URL when you are doing a POST? How about something like: C: POST /repository.git/ HTTP/1.0 C: Content-Length: <calculated> C: C: <whatever command you want> A dumb server will respond with: S: HTTP/1.1 405 Method not allowed (expected according to the RFC) Or S: HTTP/1.1 404 Not Found (resulting from testing against my own repo :-) ) While a smart server will respond with a "200 Ok" and the results of the command. Also, if everything is done via POST, you don't have to worry about a wget-cloned server appearing to be "smart", since no "smarts" will ever be returned in response to a GET request (and to the best of my knowledge, wget can't mirror using POST). Rogan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 9:53 ` Rogan Dawes @ 2008-08-04 10:08 ` Johannes Schindelin 2008-08-04 10:14 ` Rogan Dawes 2008-08-04 14:48 ` Shawn O. Pearce 1 sibling, 1 reply; 42+ messages in thread From: Johannes Schindelin @ 2008-08-04 10:08 UTC (permalink / raw) To: Rogan Dawes; +Cc: Shawn O. Pearce, Junio C Hamano, git, H. Peter Anvin Hi, On Mon, 4 Aug 2008, Rogan Dawes wrote: > Shawn O. Pearce wrote: > > > Perhaps the smart server detection is something like: > > > > Smart Server Detection > > ---------------------- > > > > To detect a smart (Git-aware) server a client sends an > > empty POST request to info/refs; if a 200 OK response is > > received with the proper content type then the server can > > be assumed to be Git-aware, and the result contains the > > current info/refs data for that repository. > > > > C: POST /repository.git/info/refs HTTP/1.0 > > C: Content-Length: 0 > > > > S: HTTP/1.0 200 OK > > S: Content-Type: application/x-git-refs > > S: > > S:95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint > > > > Then clients should just attempt this POST first before issuing > > a GET info/refs. Non Git-aware servers will issue an error code, > > and the client can retry with a standard GET request, and assume > > the server isn't a newer style. > > > > I don't understand why you would want to keep the commands in the URL > when you are doing a POST? Caching. Hth, Dscho ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 10:08 ` Johannes Schindelin @ 2008-08-04 10:14 ` Rogan Dawes 2008-08-04 10:26 ` Johannes Schindelin 0 siblings, 1 reply; 42+ messages in thread From: Rogan Dawes @ 2008-08-04 10:14 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Shawn O. Pearce, Junio C Hamano, git, H. Peter Anvin Johannes Schindelin wrote: > Hi, > > On Mon, 4 Aug 2008, Rogan Dawes wrote: > >> I don't understand why you would want to keep the commands in the URL >> when you are doing a POST? > > Caching. > > Hth, > Dscho > If you are expecting something to be cacheable, then should you not be using a GET anyway? Anyway, from RFC 2616: > 13.10 Invalidation After Updates or Deletions > > ... > > Some HTTP methods MUST cause a cache to invalidate an entity. This is > either the entity referred to by the Request-URI, or by the Location > or Content-Location headers (if present). These methods are: > > - PUT > - DELETE > - POST This doesn't seem negotiable to me. Unless I am misunderstanding your "Caching" comment to mean "To enable caching", as opposed to "To prevent caching"? Rogan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 10:14 ` Rogan Dawes @ 2008-08-04 10:26 ` Johannes Schindelin 0 siblings, 0 replies; 42+ messages in thread From: Johannes Schindelin @ 2008-08-04 10:26 UTC (permalink / raw) To: Rogan Dawes; +Cc: Shawn O. Pearce, Junio C Hamano, git, H. Peter Anvin Hi, On Mon, 4 Aug 2008, Rogan Dawes wrote: > Johannes Schindelin wrote: > > > On Mon, 4 Aug 2008, Rogan Dawes wrote: > > > > > I don't understand why you would want to keep the commands in the > > > URL when you are doing a POST? > > > > Caching. > > If you are expecting something to be cacheable, then should you not be > using a GET anyway? Yes. And I think the wget thing is not an issue: we should not try to prevent every single idiocy. Ciao, Dscho ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 9:53 ` Rogan Dawes 2008-08-04 10:08 ` Johannes Schindelin @ 2008-08-04 14:48 ` Shawn O. Pearce 2008-08-04 15:45 ` Rogan Dawes 2008-08-05 1:03 ` H. Peter Anvin 1 sibling, 2 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-04 14:48 UTC (permalink / raw) To: Rogan Dawes; +Cc: Junio C Hamano, git, H. Peter Anvin Rogan Dawes <lists@dawes.za.net> wrote: > Shawn O. Pearce wrote: >> >> Smart Server Detection >> ---------------------- >> >> To detect a smart (Git-aware) server a client sends an >> empty POST request to info/refs; [...] >> >> C: POST /repository.git/info/refs HTTP/1.0 >> C: Content-Length: 0 > > I don't understand why you would want to keep the commands in the URL > when you are doing a POST? Well, as Dscho pointed out this partly has to do with caching and the transparent dumb server functionality. By using the command in the URL, and having the command match that of the dumb server file, its easier to emulate a dumb server and also to permit caching. Currently git-http-backend requests no caching for info/refs, but I could see us tweaking that to permit several minutes of caching, especially on big public sites like kernel.org. Having info/refs report stale by 5 minutes is not an issue when writes to there already have a lag due to the master-slave mirroring system in use. Because git-http-backend emulates a dumb server there is a command dispatch table based upon the URL submitted. Thus we already have the command dispatch behavior implemented in the URL and doing it in the POST body would only complicate the code further. > Also, if everything is done via POST, you don't have to worry about a > wget-cloned server appearing to be "smart", since no "smarts" will ever > be returned in response to a GET request (and to the best of my > knowledge, wget can't mirror using POST). I think we fixed the wget-cloned server issue by requesting that clients use POST /info/refs to identify a smart server. A wget-cloned repository will fail on this, and the client can fallback to GET /info/refs and assume it must use the object walker to fetch (or WebDAV to push). A smart server would respond to the POST /info/refs request correctly and the client would know its smart. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 14:48 ` Shawn O. Pearce @ 2008-08-04 15:45 ` Rogan Dawes 2008-08-04 15:59 ` Shawn O. Pearce 2008-08-05 1:03 ` H. Peter Anvin 1 sibling, 1 reply; 42+ messages in thread From: Rogan Dawes @ 2008-08-04 15:45 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, git, H. Peter Anvin Shawn O. Pearce wrote: > Rogan Dawes <lists@dawes.za.net> wrote: >> Shawn O. Pearce wrote: >>> Smart Server Detection >>> ---------------------- >>> >>> To detect a smart (Git-aware) server a client sends an >>> empty POST request to info/refs; [...] >>> >>> C: POST /repository.git/info/refs HTTP/1.0 >>> C: Content-Length: 0 >> I don't understand why you would want to keep the commands in the URL >> when you are doing a POST? > > Well, as Dscho pointed out this partly has to do with caching and > the transparent dumb server functionality. By using the command in > the URL, and having the command match that of the dumb server file, > its easier to emulate a dumb server and also to permit caching. > > Currently git-http-backend requests no caching for info/refs, but > I could see us tweaking that to permit several minutes of caching, > especially on big public sites like kernel.org. Having info/refs > report stale by 5 minutes is not an issue when writes to there > already have a lag due to the master-slave mirroring system in use. Fair enough, but what about the quote from RFC2616 that I posted in rebuttal to Dscho? > 13.10 Invalidation After Updates or Deletions > > ... > > Some HTTP methods MUST cause a cache to invalidate an entity. This is > either the entity referred to by the Request-URI, or by the Location > or Content-Location headers (if present). These methods are: > > - PUT > - DELETE > - POST This doesn't seem negotiable to me. For those resources that are expected to be cacheable, the request should be made using a GET. > Because git-http-backend emulates a dumb server there is a command > dispatch table based upon the URL submitted. Thus we already have > the command dispatch behavior implemented in the URL and doing it > in the POST body would only complicate the code further. Not by a huge amount, surely? if (method == "GET") command = ... else if (method == "POST") command = ... dispatch(command); Rogan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 15:45 ` Rogan Dawes @ 2008-08-04 15:59 ` Shawn O. Pearce 2008-08-04 16:18 ` Rogan Dawes 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-04 15:59 UTC (permalink / raw) To: Rogan Dawes; +Cc: Junio C Hamano, git, H. Peter Anvin Rogan Dawes <lists@dawes.za.net> wrote: > Shawn O. Pearce wrote: >> Currently git-http-backend requests no caching for info/refs [...] > > Fair enough, but what about the quote from RFC2616 that I posted in > rebuttal to Dscho? > > > 13.10 Invalidation After Updates or Deletions > > > > ... > > > > Some HTTP methods MUST cause a cache to invalidate an entity. This is > > either the entity referred to by the Request-URI, or by the Location > > or Content-Location headers (if present). These methods are: > > > > - PUT > > - DELETE > > - POST > > This doesn't seem negotiable to me. Its not negotiable. POST requires no caching. End of discussion. > For those resources that are expected to be cacheable, the request > should be made using a GET. That's exactly what we are doing. Where caching is reasonable we are using a GET request. Where caching cannot be performed as the server state is changing (e.g. actually updating refs) we are using POST. That is entirely within the guidelines of the RFC. However we are "abusing" POST for "POST /info/refs" to detect a Git-aware HTTP server. Sending POST to a static resource should always fail. >> Because git-http-backend emulates a dumb server there is a command >> dispatch table based upon the URL submitted. Thus we already have >> the command dispatch behavior implemented in the URL and doing it >> in the POST body would only complicate the code further. > > Not by a huge amount, surely? > > if (method == "GET") command = ... > else if (method == "POST") command = ... > dispatch(command); Well, true, we could do that. But then we have to break the command name out of the input stream. In some cases we may just be exec'ing another Git process and letting it handle the input stream. Shoving the command name into the start of it just makes it that much harder to parse out. We already have to handle splitting PATH_TRANSLATED into a pair of (GIT_DIR, command) so we can handle that for a GET. We might as well just use that very same code for POST to select the command. Besides, by placing the command name into the URL server admins can use regex filters in their configurations to control access. If we shove the command name into the body of a POST they cannot do this. I can see sites wanting to offer anonymous smart fetch, but require password protected smart push on the same repository URL. Slapping a directive like: <Location ~ ^/git/.*/receive-pack$> require valid-user ... </Location> Would easily make Apache implement this for us. Most modern HTTP servers should be able to be configured like this. One of the problems with these RPC-in-HTTP systems is always the fact that the true nature of the action isn't visible in the method and URL, causing servers and proxies to have to parse the stream to implement firewall rules. Or to provide access control. I'm trying to reuse as much of the access control support as possible from the HTTP server and put as little of it as possible into the backend CGI. Since the backend CGI is based upon git-receive-pack itself admins can use the standard pre-receive/update hook pair to manage branch level security in a repository, while gross-level read/write can be done in the server. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 15:59 ` Shawn O. Pearce @ 2008-08-04 16:18 ` Rogan Dawes 0 siblings, 0 replies; 42+ messages in thread From: Rogan Dawes @ 2008-08-04 16:18 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, git, H. Peter Anvin Shawn O. Pearce wrote: > Rogan Dawes <lists@dawes.za.net> wrote: >> Shawn O. Pearce wrote: >>> Currently git-http-backend requests no caching for info/refs [...] >> Fair enough, but what about the quote from RFC2616 that I posted in >> rebuttal to Dscho? >> >>> 13.10 Invalidation After Updates or Deletions >>> >>> ... >>> >>> Some HTTP methods MUST cause a cache to invalidate an entity. This is >>> either the entity referred to by the Request-URI, or by the Location >>> or Content-Location headers (if present). These methods are: >>> >>> - PUT >>> - DELETE >>> - POST >> This doesn't seem negotiable to me. > > Its not negotiable. POST requires no caching. End of discussion. Aha. So now I see the objective. I had misunderstood the intention to be to *allow* caching of POST'ed resources. >> For those resources that are expected to be cacheable, the request >> should be made using a GET. > > That's exactly what we are doing. Where caching is reasonable we are > using a GET request. Where caching cannot be performed as the server > state is changing (e.g. actually updating refs) we are using POST. > That is entirely within the guidelines of the RFC. > > However we are "abusing" POST for "POST /info/refs" to detect a > Git-aware HTTP server. Sending POST to a static resource should > always fail. Right. Either with a "405 Method not supported", or a "404 Not found". as I discovered. >>> Because git-http-backend emulates a dumb server there is a command >>> dispatch table based upon the URL submitted. Thus we already have >>> the command dispatch behavior implemented in the URL and doing it >>> in the POST body would only complicate the code further. >> Not by a huge amount, surely? >> >> if (method == "GET") command = ... >> else if (method == "POST") command = ... >> dispatch(command); > > Well, true, we could do that. But then we have to break the > command name out of the input stream. In some cases we may just be > exec'ing another Git process and letting it handle the input stream. > Shoving the command name into the start of it just makes it that > much harder to parse out. Fair enough. I had not thought about other uses for the input stream. > One of the problems with these RPC-in-HTTP systems is always the > fact that the true nature of the action isn't visible in the method > and URL, causing servers and proxies to have to parse the stream to > implement firewall rules. Or to provide access control. I'm trying > to reuse as much of the access control support as possible from the > HTTP server and put as little of it as possible into the backend CGI. > > Since the backend CGI is based upon git-receive-pack itself admins > can use the standard pre-receive/update hook pair to manage branch > level security in a repository, while gross-level read/write can > be done in the server. Works for me! Thanks for doing all the hard thinking for this feature :-) Rogan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-04 14:48 ` Shawn O. Pearce 2008-08-04 15:45 ` Rogan Dawes @ 2008-08-05 1:03 ` H. Peter Anvin 2008-08-05 1:24 ` Shawn O. Pearce 1 sibling, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-05 1:03 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Rogan Dawes, Junio C Hamano, git Shawn O. Pearce wrote: > > Currently git-http-backend requests no caching for info/refs, but > I could see us tweaking that to permit several minutes of caching, > especially on big public sites like kernel.org. Having info/refs > report stale by 5 minutes is not an issue when writes to there > already have a lag due to the master-slave mirroring system in use. > > Because git-http-backend emulates a dumb server there is a command > dispatch table based upon the URL submitted. Thus we already have > the command dispatch behavior implemented in the URL and doing it > in the POST body would only complicate the code further. > Let's put it this way: we're not seeing a huge amount of load from git protocol requests, and I'm going to assume "git+http" protocol to be used only by sites behind braindamaged firewalls (everyone else would use git protocol), so I'm not really all that worried about it. I'm not sure if "emulating a dumb server" is desirable at all; it seems like it would at least in part defeat the purpose of minimizing the transaction count and otherwise be as much of a "smart" server as the medium permits. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-05 1:03 ` H. Peter Anvin @ 2008-08-05 1:24 ` Shawn O. Pearce 2008-08-05 1:35 ` H. Peter Anvin 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-05 1:24 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Rogan Dawes, Junio C Hamano, git "H. Peter Anvin" <hpa@zytor.com> wrote: > Shawn O. Pearce wrote: >> >> Currently git-http-backend requests no caching for info/refs [...] > > Let's put it this way: we're not seeing a huge amount of load from git > protocol requests, and I'm going to assume "git+http" protocol to be > used only by sites behind braindamaged firewalls (everyone else would > use git protocol), so I'm not really all that worried about it. Agreed. There's another application I want git+http for, but that may never materialize. Or maybe it will someday. I just have to adopt a wait and see approach there. > I'm not sure if "emulating a dumb server" is desirable at all; it seems > like it would at least in part defeat the purpose of minimizing the > transaction count and otherwise be as much of a "smart" server as the > medium permits. I think it is a really good idea. Then clients don't have to worry about which HTTP URL is the "correct" one for them to be using. End users will just magically get the smart git+http variant if both sides support it and they need to use HTTP due to firewalls. Clients will fall back onto the dumb protocol if the server doesn't support smart clones. Older clients (pre git+http) will still be able to talk to a smart server, just slower. This is nice for the end user. No thinking is required. Never ask a human to do what a machine can do in less time. I think its just 1 extra HTTP hit per fetch/push done against a dumb server. On a smart server that first hit will also give us what we need to begin the conversation (the info/refs data). On a dumb server its a wasted hit, but a dumb server is already doing to suck. One extra HTTP request against a dumb server is a drop in the bucket. Its also a pretty small request (an empty POST). -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-05 1:24 ` Shawn O. Pearce @ 2008-08-05 1:35 ` H. Peter Anvin 2008-08-05 1:57 ` Shawn O. Pearce 0 siblings, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-05 1:35 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Rogan Dawes, Junio C Hamano, git Shawn O. Pearce wrote: > >> I'm not sure if "emulating a dumb server" is desirable at all; it seems >> like it would at least in part defeat the purpose of minimizing the >> transaction count and otherwise be as much of a "smart" server as the >> medium permits. > > I think it is a really good idea. Then clients don't have to worry > about which HTTP URL is the "correct" one for them to be using. > End users will just magically get the smart git+http variant if > both sides support it and they need to use HTTP due to firewalls. > Clients will fall back onto the dumb protocol if the server doesn't > support smart clones. Older clients (pre git+http) will still be > able to talk to a smart server, just slower. This is nice for the > end user. No thinking is required. > > Never ask a human to do what a machine can do in less time. > > I think its just 1 extra HTTP hit per fetch/push done against > a dumb server. On a smart server that first hit will also give > us what we need to begin the conversation (the info/refs data). > On a dumb server its a wasted hit, but a dumb server is already > doing to suck. One extra HTTP request against a dumb server is a > drop in the bucket. Its also a pretty small request (an empty POST). > Not arguing that URL compatibility isn't a good thing, but there are other ways to accomplish it, too. After detecting either a smart or dumb server, we can use a redirect to point them to a different URL, as appropriate. Furthermore, in the case of round-robin sites like kernel.org, this is actually *mandatory* in the case of a stateful server (we need a redirect to a server-specific URL), and highly recommended in the case of a stateless server (because of potential skew.) -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-05 1:35 ` H. Peter Anvin @ 2008-08-05 1:57 ` Shawn O. Pearce 2008-08-05 2:02 ` H. Peter Anvin 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-05 1:57 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Rogan Dawes, Junio C Hamano, git "H. Peter Anvin" <hpa@zytor.com> wrote: > Shawn O. Pearce wrote: >> >> I think it is a really good idea. Then clients don't have to worry >> about which HTTP URL is the "correct" one for them to be using. > > Not arguing that URL compatibility isn't a good thing, but there are > other ways to accomplish it, too. After detecting either a smart or > dumb server, we can use a redirect to point them to a different URL, as > appropriate. I'm not sure this is necessary. Of course it all comes down to "how does an admin map Git repositories into the URL space of the server"? I thought it would be simple if the admin was able to map repositories using a ScriptAlias and allow the server to perform path info translation to give us the filesystem location of the repository. Then we don't have to configure our own map of the available Git repositories. Once you do that though you now have the URL space associated with that repository served by a CGI. For older clients we need to either serve them the file, or issue a redirect to serve the file. The redirect is messy because we need some configuration to explain where the files are available in the server's URL space. Or you go the other way, and have newer git+http clients try to find the git aware server by a redirect. Again we have to explain where that git aware server is in the URL space of the server. *sigh* > Furthermore, in the case of round-robin sites like kernel.org, this is > actually *mandatory* in the case of a stateful server (we need a > redirect to a server-specific URL), and highly recommended in the case > of a stateless server (because of potential skew.) Well, the git+http protocol will hold all state in the client, making each RPC a stateless RPC operation. The only issue is then dealing with skew in a server farm. I guess we need to ask client implementations to honor a redirect on the first request and reuse that new base URL for all subsequent requests that are part of the same "operation". Then server farms can issue a redirect to a server-specific hostname if a client comes in with a round-robin DNS hostname, thus ensuring that for this current operation there isn't skew. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-05 1:57 ` Shawn O. Pearce @ 2008-08-05 2:02 ` H. Peter Anvin 2008-08-13 1:56 ` H. Peter Anvin 0 siblings, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-05 2:02 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Rogan Dawes, Junio C Hamano, git Shawn O. Pearce wrote: > > I guess we need to ask client implementations to honor a redirect > on the first request and reuse that new base URL for all subsequent > requests that are part of the same "operation". Then server farms > can issue a redirect to a server-specific hostname if a client > comes in with a round-robin DNS hostname, thus ensuring that for > this current operation there isn't skew. > Either that, or you can pass a "chase URL" in the payload of the request... it's more or less the same concept. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-05 2:02 ` H. Peter Anvin @ 2008-08-13 1:56 ` H. Peter Anvin 2008-08-13 2:37 ` Shawn O. Pearce 0 siblings, 1 reply; 42+ messages in thread From: H. Peter Anvin @ 2008-08-13 1:56 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Rogan Dawes, Junio C Hamano, git Anything we can do to keep this moving forward? I was extremely encouraged with the fast progress on this; this would be great to get to the point where we (kernel.org) can deploy it at least for testing. -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-13 1:56 ` H. Peter Anvin @ 2008-08-13 2:37 ` Shawn O. Pearce 0 siblings, 0 replies; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-13 2:37 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Rogan Dawes, Junio C Hamano, git "H. Peter Anvin" <hpa@zytor.com> wrote: > Anything we can do to keep this moving forward? I was extremely > encouraged with the fast progress on this; this would be great to get to > the point where we (kernel.org) can deploy it at least for testing. Sorry, I dropped it with my egit work. I'll pick it up again and try to continue it further. I left off trying to implement the push client and saying "damn, jgit is better structured to make this sort of change than C git" and decided it was too late at night to continue it more. That was like a week ago. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <200808130326.m7D3Pr2V000918@terminus.zytor.com>]
[parent not found: <20080813032812.GD5855@spearce.org>]
[parent not found: <48A262B9.8020608@zytor.com>]
* Re: Add Git-aware CGI for Git-aware smart HTTP transport [not found] ` <48A262B9.8020608@zytor.com> @ 2008-08-13 14:53 ` Shawn O. Pearce 2008-08-13 15:41 ` H. Peter Anvin 0 siblings, 1 reply; 42+ messages in thread From: Shawn O. Pearce @ 2008-08-13 14:53 UTC (permalink / raw) To: H. Peter Anvin; +Cc: git [...added git ML back to thread...] "H. Peter Anvin" <hpa@zytor.com> wrote: > Shawn O. Pearce wrote: >> The plan I've proposed requires wedging the CGI in between the HTTP >> server and the repository files. Which means older dumb clients >> get data by forking off the CGI, rather than letting the HTTP server >> stream the file itself. > > Yeah, that's quite a bit unfortunate, because it means some potentially > very expensive buffering in Apache. That's one reason to do some kind > of redirection. Hmm. So what if the "smart" protocol used a redirect to the CGI and the dumb protocol didn't use any redirects at all? I say this because I think the dumb protocol won't handle redirects well. It will do them, but it would incur a redirect on every request it makes. So if we have the "smart" protocol perform detection by trying: C: HEAD /path/to/repository.git/git-http-backend HTTP/1.0 S: HTTP/1.0 302 Found S: Location: /git-http/path/to/repository.git Under Apache this server configuration can be easily handled by a mod_rewrite regex: RewriteRule ^(/pub/scm/.*)/git-http-backend$ /git/$1 [R,L] ScriptAlias /git/ /path/to/git-http-backend/ Individual users could also install the git-http-backend CGI right into their repository, in which case the CGI if invoked with no PATH_INFO can do a redirect back to itself to indicate where GIT_DIR is: C: HEAD /path/to/repository.git/git-http-backend HTTP/1.0 S: HTTP/1.0 302 Found S: Location: /path/to/repository.git/git-http-backend/. Individual operations can be selected by appending on the operation name, so <Location ~ > style rules can be used to apply access controls, such as: # Disallow push to any smart repository via ScriptAlias # <Location ~ ^/git/.*/receive-pack$> Order Deny,Allow Deny from all </Location> # Disallow push to any smart repository with CGI in tree. # <Location ~ .*/git-http-backend/./receive-pack$> Order Deny,Allow Deny from all </Location> Setting this up on a server which doesn't have the power of mod_regex available would be tricky, as you need to link the CGI into every single repository you are serving. I don't know (or use) many other HTTP servers beyond Apache so I'm not sure if they can do this. -- Shawn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Add Git-aware CGI for Git-aware smart HTTP transport 2008-08-13 14:53 ` Shawn O. Pearce @ 2008-08-13 15:41 ` H. Peter Anvin 0 siblings, 0 replies; 42+ messages in thread From: H. Peter Anvin @ 2008-08-13 15:41 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git Shawn O. Pearce wrote: > > Hmm. So what if the "smart" protocol used a redirect to the CGI > and the dumb protocol didn't use any redirects at all? I say this > because I think the dumb protocol won't handle redirects well. > It will do them, but it would incur a redirect on every request > it makes. > That's preferrable anyway, in my opinion. > Setting this up on a server which doesn't have the power of mod_regex > available would be tricky, as you need to link the CGI into every > single repository you are serving. I don't know (or use) many other > HTTP servers beyond Apache so I'm not sure if they can do this. Many can, and even more can if we instead of git-http-backend had something which looked vaguely like a unique extension, like "backend.git-http" -hpa ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2008-08-13 15:43 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-01 21:50 More on git over HTTP POST H. Peter Anvin 2008-08-02 20:57 ` Shawn O. Pearce 2008-08-02 21:00 ` Daniel Stenberg 2008-08-02 21:08 ` Shawn O. Pearce 2008-08-02 21:23 ` Petr Baudis 2008-08-02 21:32 ` Shawn O. Pearce 2008-08-03 2:56 ` Shawn O. Pearce 2008-08-03 3:27 ` Junio C Hamano 2008-08-03 3:31 ` Shawn O. Pearce 2008-08-03 3:47 ` H. Peter Anvin 2008-08-03 4:10 ` Shawn O. Pearce 2008-08-03 8:10 ` david 2008-08-03 11:42 ` H. Peter Anvin 2008-08-03 11:29 ` H. Peter Anvin 2008-08-03 3:51 ` H. Peter Anvin 2008-08-03 4:12 ` Shawn O. Pearce 2008-08-03 11:31 ` H. Peter Anvin 2008-08-03 4:01 ` H. Peter Anvin 2008-08-03 6:43 ` Mike Hommey 2008-08-03 7:25 ` [RFC 1/2] Add backdoor options to receive-pack for use in Git-aware CGI Shawn O. Pearce 2008-08-03 7:25 ` [RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport Shawn O. Pearce 2008-08-03 11:38 ` H. Peter Anvin 2008-08-03 21:25 ` Shawn O. Pearce 2008-08-03 22:16 ` Junio C Hamano 2008-08-04 3:59 ` Shawn O. Pearce 2008-08-04 9:53 ` Rogan Dawes 2008-08-04 10:08 ` Johannes Schindelin 2008-08-04 10:14 ` Rogan Dawes 2008-08-04 10:26 ` Johannes Schindelin 2008-08-04 14:48 ` Shawn O. Pearce 2008-08-04 15:45 ` Rogan Dawes 2008-08-04 15:59 ` Shawn O. Pearce 2008-08-04 16:18 ` Rogan Dawes 2008-08-05 1:03 ` H. Peter Anvin 2008-08-05 1:24 ` Shawn O. Pearce 2008-08-05 1:35 ` H. Peter Anvin 2008-08-05 1:57 ` Shawn O. Pearce 2008-08-05 2:02 ` H. Peter Anvin 2008-08-13 1:56 ` H. Peter Anvin 2008-08-13 2:37 ` Shawn O. Pearce [not found] <200808130326.m7D3Pr2V000918@terminus.zytor.com> [not found] ` <20080813032812.GD5855@spearce.org> [not found] ` <48A262B9.8020608@zytor.com> 2008-08-13 14:53 ` Shawn O. Pearce 2008-08-13 15:41 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).