* [PATCH 0/3] make smart-http more robust against bogus server input
@ 2013-02-16 6:44 Jeff King
2013-02-16 6:46 ` [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode Jeff King
` (2 more replies)
0 siblings, 3 replies; 50+ messages in thread
From: Jeff King @ 2013-02-16 6:44 UTC (permalink / raw)
To: git; +Cc: Shawn O. Pearce
For the most part, smart-http just passes data to fetch-pack and
send-pack, which take care of the heavy lifting. However, I did find a
few corner cases around truncated data from the server, one of which can
actually cause a deadlock.
I found these because I was trying to figure out what was going on with
some hung git processes which were in a deadlock like the one described
in patch 3. But having experimented and read the code, I don't think
that it is triggerable from a normal clone, but rather only when you
poke git-remote-curl in the right way. So it may or may not be my
culprit, but these patches do make remote-curl more robust, which is a
good thing.
[1/3]: pkt-line: teach packet_get_line a no-op mode
[2/3]: remote-curl: verify smart-http metadata lines
[3/3]: remote-curl: sanity check ref advertisement from server
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode
2013-02-16 6:44 [PATCH 0/3] make smart-http more robust against bogus server input Jeff King
@ 2013-02-16 6:46 ` Jeff King
2013-02-17 10:41 ` Jonathan Nieder
2013-02-16 6:47 ` [PATCH 2/3] remote-curl: verify smart-http metadata lines Jeff King
2013-02-16 6:49 ` [PATCH 3/3] remote-curl: sanity check ref advertisement from server Jeff King
2 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-16 6:46 UTC (permalink / raw)
To: git; +Cc: Shawn O. Pearce
You can use packet_get_line to parse a single packet out of
a stream and into a buffer. However, if you just want to
throw away a set of packets from the stream, there's no need
to even bother copying the bytes. This patch treats a NULL
output buffer as a hint that the caller does not even want
to see the output.
We have to tweak the packet_trace call, too, since it showed
the trace from the copied buffer, which now might not exist.
The new code is actually more correct, though, as it shows
just what we parsed, not any cruft that may have been in the
output buffer before (it never mattered, though, because all
callers gave us a fresh buffer).
Signed-off-by: Jeff King <peff@peff.net>
---
pkt-line.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/pkt-line.c b/pkt-line.c
index eaba15f..7f28701 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -234,9 +234,10 @@ int packet_get_line(struct strbuf *out,
*src_len -= 4;
len -= 4;
- strbuf_add(out, *src_buf, len);
+ if (out)
+ strbuf_add(out, *src_buf, len);
+ packet_trace(*src_buf, len, 0);
*src_buf += len;
*src_len -= len;
- packet_trace(out->buf, out->len, 0);
return len;
}
--
1.8.1.2.11.g1a2f572
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/3] remote-curl: verify smart-http metadata lines
2013-02-16 6:44 [PATCH 0/3] make smart-http more robust against bogus server input Jeff King
2013-02-16 6:46 ` [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode Jeff King
@ 2013-02-16 6:47 ` Jeff King
2013-02-17 10:49 ` Jonathan Nieder
2013-02-16 6:49 ` [PATCH 3/3] remote-curl: sanity check ref advertisement from server Jeff King
2 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-16 6:47 UTC (permalink / raw)
To: git; +Cc: Shawn O. Pearce
A smart http ref advertisement starts with a packet
containing the service header, followed by an arbitrary
number of packets containing other metadata headers,
followed by a flush packet.
We don't currently recognize any other metadata headers, so
we just parse through any extra packets, throwing away their
contents. However, we don't do so very carefully, and just
stop at the first error or flush packet.
Let's flag any errors we see here, which might be a sign of
truncated or corrupted output. Since the rest of the data
should be the ref advertisement, and since we pass that
along to our helper programs (like fetch-pack), they will
probably notice the error, as whatever cruft is in the
buffer will not parse. However, it's nice to note problems
as early as possible, which can help in debugging the root
cause.
Signed-off-by: Jeff King <peff@peff.net>
---
remote-curl.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/remote-curl.c b/remote-curl.c
index 933c69a..73134f5 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -90,6 +90,17 @@ static void free_discovery(struct discovery *d)
}
}
+static int read_packets_until_flush(char **buf, size_t *len)
+{
+ while (1) {
+ int r = packet_get_line(NULL, buf, len);
+ if (r < 0)
+ return -1;
+ if (r == 0)
+ return 0;
+ }
+}
+
static struct discovery* discover_refs(const char *service)
{
struct strbuf exp = STRBUF_INIT;
@@ -155,11 +166,13 @@ static struct discovery* discover_refs(const char *service)
/* The header can include additional metadata lines, up
* until a packet flush marker. Ignore these now, but
- * in the future we might start to scan them.
+ * in the future we might start to scan them. However, we do
+ * still check to make sure we are getting valid packet lines,
+ * ending with a flush.
*/
- strbuf_reset(&buffer);
- while (packet_get_line(&buffer, &last->buf, &last->len) > 0)
- strbuf_reset(&buffer);
+ if (read_packets_until_flush(&last->buf, &last->len) < 0)
+ die("smart-http metadata lines are invalid at %s",
+ refs_url);
last->proto_git = 1;
}
--
1.8.1.2.11.g1a2f572
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 3/3] remote-curl: sanity check ref advertisement from server
2013-02-16 6:44 [PATCH 0/3] make smart-http more robust against bogus server input Jeff King
2013-02-16 6:46 ` [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode Jeff King
2013-02-16 6:47 ` [PATCH 2/3] remote-curl: verify smart-http metadata lines Jeff King
@ 2013-02-16 6:49 ` Jeff King
2013-02-17 11:05 ` Jonathan Nieder
2 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-16 6:49 UTC (permalink / raw)
To: git; +Cc: Shawn O. Pearce
If the smart HTTP response from the server is truncated for
any reason, we will get an incomplete ref advertisement. If
we then feed this incomplete list to "fetch-pack", one of a
few things may happen:
1. If the truncation is in a packet header, fetch-pack
will notice the bogus line and complain.
2. If the truncation is inside a packet, fetch-pack will
keep waiting for us to send the rest of the packet,
which we never will.
3. If the truncation is at a packet boundary, fetch-pack
will keep waiting for us to send the next packet, which
we never will.
As a result, fetch-pack hangs, waiting for input. However,
remote-curl believes it has sent all of the advertisement,
and therefore waits for fetch-pack to speak. The two
processes end up in a deadlock.
This fortunately doesn't happen in the normal fetching
workflow, because git-fetch first uses the "list" command,
which feeds the refs to get_remote_heads, which does notice
the error. However, you can trigger it by sending a direct
"fetch" to the remote-curl helper.
We can make this more robust by verifying that the packet
stream we got from the server does indeed parse correctly
and ends with a flush packet, which means that what
fetch-pack receives will at least be syntactically correct.
The normal non-stateless-rpc case does not have to deal with
this problem; it detects a truncation by getting EOF on the
file descriptor before it has read all data. So it is
tempting to think that we can solve this by closing the
descriptor after relaying the server's advertisement.
Unfortunately, in the stateless rpc case, we need to keep
the descriptor to fetch-pack open in order to pass more data
to it.
We could solve that by using two descriptors, but our
run-command interface does not support that (and modifying
it to create more pipes would make life hard for the Windows
port of git).
Signed-off-by: Jeff King <peff@peff.net>
---
remote-curl.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/remote-curl.c b/remote-curl.c
index 73134f5..c7647c7 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -101,6 +101,15 @@ static int read_packets_until_flush(char **buf, size_t *len)
}
}
+static int verify_ref_advertisement(char *buf, size_t len)
+{
+ /*
+ * Our function parameters are copies, so we do not
+ * have to care that read_packets will increment our pointers.
+ */
+ return read_packets_until_flush(&buf, &len);
+}
+
static struct discovery* discover_refs(const char *service)
{
struct strbuf exp = STRBUF_INIT;
@@ -174,6 +183,9 @@ static struct discovery* discover_refs(const char *service)
die("smart-http metadata lines are invalid at %s",
refs_url);
+ if (verify_ref_advertisement(last->buf, last->len) < 0)
+ die("ref advertisement is invalid at %s", refs_url);
+
last->proto_git = 1;
}
--
1.8.1.2.11.g1a2f572
^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode
2013-02-16 6:46 ` [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode Jeff King
@ 2013-02-17 10:41 ` Jonathan Nieder
0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-17 10:41 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -234,9 +234,10 @@ int packet_get_line(struct strbuf *out,
> *src_len -= 4;
> len -= 4;
>
> - strbuf_add(out, *src_buf, len);
> + if (out)
> + strbuf_add(out, *src_buf, len);
> + packet_trace(*src_buf, len, 0);
> *src_buf += len;
> *src_len -= len;
> - packet_trace(out->buf, out->len, 0);
> return len;
For what it's worth,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
The above code has a structure of
prepare to return(buf, len);
trace(buf, len);
discard used part of buf;
return;
which is nice and readable.
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/3] remote-curl: verify smart-http metadata lines
2013-02-16 6:47 ` [PATCH 2/3] remote-curl: verify smart-http metadata lines Jeff King
@ 2013-02-17 10:49 ` Jonathan Nieder
2013-02-17 19:14 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-17 10:49 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> --- a/remote-curl.c
> +++ b/remote-curl.c
[...]
> @@ -155,11 +166,13 @@ static struct discovery* discover_refs(const char *service)
[...]
> - strbuf_reset(&buffer);
> - while (packet_get_line(&buffer, &last->buf, &last->len) > 0)
> - strbuf_reset(&buffer);
> + if (read_packets_until_flush(&last->buf, &last->len) < 0)
Style nit: this made me wonder "What would it mean if
read_packets_until_flush() > 0?" Since the convention for this
function is "0 for success", I would personally find
if (read_packets_until_flush(...))
handle error;
easier to read.
> + die("smart-http metadata lines are invalid at %s",
> + refs_url);
Especially given that other clients would be likely to run into
trouble in the same situation, as long as this cooks in "next" for a
suitable amount of time to catch bad servers, it looks like a good
idea.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Thanks.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] remote-curl: sanity check ref advertisement from server
2013-02-16 6:49 ` [PATCH 3/3] remote-curl: sanity check ref advertisement from server Jeff King
@ 2013-02-17 11:05 ` Jonathan Nieder
2013-02-17 19:28 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-17 11:05 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> If the smart HTTP response from the server is truncated for
> any reason, we will get an incomplete ref advertisement. If
> we then feed this incomplete list to "fetch-pack", one of a
> few things may happen:
>
> 1. If the truncation is in a packet header, fetch-pack
> will notice the bogus line and complain.
>
> 2. If the truncation is inside a packet, fetch-pack will
> keep waiting for us to send the rest of the packet,
> which we never will.
Mostly harmless since the operator could hit ^C, but still unpleasant.
[...]
> This fortunately doesn't happen in the normal fetching
> workflow, because git-fetch first uses the "list" command,
> which feeds the refs to get_remote_heads, which does notice
> the error. However, you can trigger it by sending a direct
> "fetch" to the remote-curl helper.
Ah. Would a test for this make sense?
[...]
> --- a/remote-curl.c
> +++ b/remote-curl.c
[...]
> @@ -174,6 +183,9 @@ static struct discovery* discover_refs(const char *service)
> die("smart-http metadata lines are invalid at %s",
> refs_url);
>
> + if (verify_ref_advertisement(last->buf, last->len) < 0)
> + die("ref advertisement is invalid at %s", refs_url);
Won't this error out with
protocol error: bad line length character: ERR
instead of the current more helpful behavior for ERR lines?
Same stylistic comment about "what would it mean for the return value
to be positive?" as in patch 2/3.
Aside from those two details, the idea looks sane, though. Good
catch, and thanks for a pleasant read.
Good night,
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/3] remote-curl: verify smart-http metadata lines
2013-02-17 10:49 ` Jonathan Nieder
@ 2013-02-17 19:14 ` Jeff King
2013-02-18 0:54 ` Jonathan Nieder
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-17 19:14 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Sun, Feb 17, 2013 at 02:49:39AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> [...]
> > @@ -155,11 +166,13 @@ static struct discovery* discover_refs(const char *service)
> [...]
> > - strbuf_reset(&buffer);
> > - while (packet_get_line(&buffer, &last->buf, &last->len) > 0)
> > - strbuf_reset(&buffer);
> > + if (read_packets_until_flush(&last->buf, &last->len) < 0)
>
> Style nit: this made me wonder "What would it mean if
> read_packets_until_flush() > 0?" Since the convention for this
> function is "0 for success", I would personally find
>
> if (read_packets_until_flush(...))
> handle error;
>
> easier to read.
My intent was that it followed the error convention of "negative is
error, 0 is success, and positive is not used, but reserved for
future use". And I tend to think the "< 0" makes it obvious that we are
interested in error. But I don't feel that strongly, so if people would
rather see it the other way, I can live with it.
> > + die("smart-http metadata lines are invalid at %s",
> > + refs_url);
>
> Especially given that other clients would be likely to run into
> trouble in the same situation, as long as this cooks in "next" for a
> suitable amount of time to catch bad servers, it looks like a good
> idea.
Yeah, I have a slight concern that this series would break something in
another implementation, so I would like to see this cook in "next" for a
while (and would be slated for master probably not in this release, but
in the next one). But I think this change is pretty straightforward. If
an implementation is producing bogus packet lines and expecting us not
to complain, it really needs to be fixed.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] remote-curl: sanity check ref advertisement from server
2013-02-17 11:05 ` Jonathan Nieder
@ 2013-02-17 19:28 ` Jeff King
2013-02-18 1:41 ` Jonathan Nieder
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-17 19:28 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Sun, Feb 17, 2013 at 03:05:34AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > If the smart HTTP response from the server is truncated for
> > any reason, we will get an incomplete ref advertisement. If
> > we then feed this incomplete list to "fetch-pack", one of a
> > few things may happen:
> >
> > 1. If the truncation is in a packet header, fetch-pack
> > will notice the bogus line and complain.
> >
> > 2. If the truncation is inside a packet, fetch-pack will
> > keep waiting for us to send the rest of the packet,
> > which we never will.
>
> Mostly harmless since the operator could hit ^C, but still unpleasant.
Fetching is not always interactive. The deadlock I ran into (and again,
I am not sure if this fixes it or not, but it is _a_ deadlock) was on a
server farm doing a large number of "fetch && checkout && deploy"
operations. Only some of them hung, but it took a while to figure out
what was going on.
> [...]
> > This fortunately doesn't happen in the normal fetching
> > workflow, because git-fetch first uses the "list" command,
> > which feeds the refs to get_remote_heads, which does notice
> > the error. However, you can trigger it by sending a direct
> > "fetch" to the remote-curl helper.
>
> Ah. Would a test for this make sense?
A test would be great, if you can devise a way to reliably produce
truncated git output (but still valid http output). In the real-world
problem I had, I believe the truncation was caused by an intermediate
reverse proxy that hit a timeout. I simulated truncation by using netcat
to replay munged http headers and git output.
I suspect the simplest portable thing would be a static file of
truncated git output, served by apache, which would need custom
configuration to serve it with the correct content-type header. It
seemed like a lot of test infrastructure to check for a very specific
thing, so I abandoned trying to make a test.
> > + if (verify_ref_advertisement(last->buf, last->len) < 0)
> > + die("ref advertisement is invalid at %s", refs_url);
>
> Won't this error out with
>
> protocol error: bad line length character: ERR
>
> instead of the current more helpful behavior for ERR lines?
I don't think so. Don't ERR lines appear inside their own packets? We
are just verifying that our packets are syntactically correct here, and
my reading of get_remote_heads is that the ERR appears inside the
packetized data.
The one thing we do also check, though, is that we end with a flush
packet. So depending on what servers produce, it may mean we trigger
this complaint instead of passing the ERR along to fetch-pack.
Rather than doing this fake syntactic verification, I wonder if we
should simply call get_remote_heads, which does a more thorough check
(and is what we _would_ call in the list case, and what fetch-pack will
call once we pass data to it). It's slightly less efficient, in that it
starts a new thread and actually builds the linked list of refs. But it
probably isn't that big a deal (and normal operation does a "list" first
which does that _anyway_).
> Same stylistic comment about "what would it mean for the return value
> to be positive?" as in patch 2/3.
Same response. :)
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/3] remote-curl: verify smart-http metadata lines
2013-02-17 19:14 ` Jeff King
@ 2013-02-18 0:54 ` Jonathan Nieder
2013-02-18 8:59 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 0:54 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> On Sun, Feb 17, 2013 at 02:49:39AM -0800, Jonathan Nieder wrote:
>> Jeff King wrote:
>>> --- a/remote-curl.c
[...]
>>> + if (read_packets_until_flush(&last->buf, &last->len) < 0)
>>
>> Style nit: this made me wonder "What would it mean if
>> read_packets_until_flush() > 0?"
[...]
> My intent was that it followed the error convention of "negative is
> error, 0 is success, and positive is not used, but reserved for
> future use".
>From a maintainability perspective, that kind of contract would be
dangerous, since some *other* caller could arrive and use the function
without a "< 0" without knowing it is doing anything wrong. When new
return values appear, the function should be renamed to help the patch
author and reviewers remember to check all callers.
That is, from the point of view of maintainability, there is no
distinction between "if (read_packets_until_... < 0)" and
"if (read_packets_until_...)" and either form is fine.
My comment was just to say the "< 0" forced me to pause a moment and
check out the implementation. This is basically a stylistic thing and
if you prefer to keep the "< 0", that's fine with me.
> If
> an implementation is producing bogus packet lines and expecting us not
> to complain, it really needs to be fixed.
Agreed completely. Thanks again for the patch.
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] remote-curl: sanity check ref advertisement from server
2013-02-17 19:28 ` Jeff King
@ 2013-02-18 1:41 ` Jonathan Nieder
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 1:41 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> On Sun, Feb 17, 2013 at 03:05:34AM -0800, Jonathan Nieder wrote:
>> Jeff King wrote:
>>> + if (verify_ref_advertisement(last->buf, last->len) < 0)
>>> + die("ref advertisement is invalid at %s", refs_url);
>>
>> Won't this error out with
>>
>> protocol error: bad line length character: ERR
>>
>> instead of the current more helpful behavior for ERR lines?
>
> I don't think so. Don't ERR lines appear inside their own packets?
Yes, I misread get_remote_heads for some reason. Thanks for checking.
[...]
> The one thing we do also check, though, is that we end with a flush
> packet. So depending on what servers produce, it may mean we trigger
> this complaint instead of passing the ERR along to fetch-pack.
>
> Rather than doing this fake syntactic verification, I wonder if we
> should simply call get_remote_heads, which does a more thorough check
I'm not sure whether servers are expected to send a flush after an
ERR packet. The only codepath I know of in git itself that sends
such packets is git-daemon, which does not flush after the error (but
is not used in the stateless-rpc case). http-backend uses HTTP error
codes for its errors.
If I am reading get_remote_heads correctly, calling it with the
following tweak should work ok. The extra thread is just to feed a
string into a fd-based interface and could be avoided for "list", too,
if it costs too much.
diff --git i/connect.c w/connect.c
index 49e56ba3..55ee7de7 100644
--- i/connect.c
+++ w/connect.c
@@ -68,7 +68,8 @@ struct ref **get_remote_heads(int in, struct ref **list,
{
int got_at_least_one_head = 0;
- *list = NULL;
+ if (list)
+ *list = NULL;
for (;;) {
struct ref *ref;
unsigned char old_sha1[20];
@@ -92,6 +93,9 @@ struct ref **get_remote_heads(int in, struct ref **list,
die("protocol error: expected sha/ref, got '%s'", buffer);
name = buffer + 41;
+ if (!list)
+ continue;
+
name_len = strlen(name);
if (len != name_len + 41) {
free(server_capabilities);
^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 2/3] remote-curl: verify smart-http metadata lines
2013-02-18 0:54 ` Jonathan Nieder
@ 2013-02-18 8:59 ` Jeff King
0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 8:59 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Sun, Feb 17, 2013 at 04:54:43PM -0800, Jonathan Nieder wrote:
> > My intent was that it followed the error convention of "negative is
> > error, 0 is success, and positive is not used, but reserved for
> > future use".
>
> From a maintainability perspective, that kind of contract would be
> dangerous, since some *other* caller could arrive and use the function
> without a "< 0" without knowing it is doing anything wrong. When new
> return values appear, the function should be renamed to help the patch
> author and reviewers remember to check all callers.
True. That's why I always write "< 0". :)
> That is, from the point of view of maintainability, there is no
> distinction between "if (read_packets_until_... < 0)" and
> "if (read_packets_until_...)" and either form is fine.
>
> My comment was just to say the "< 0" forced me to pause a moment and
> check out the implementation. This is basically a stylistic thing and
> if you prefer to keep the "< 0", that's fine with me.
Interesting. To me, "foo() < 0" just reads idiomatically as "error-check
the foo call".
Anyway, I've redone the patch series to just re-use get_remote_heads,
which is more robust. So this function has gone away in the new version.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 1:41 ` Jonathan Nieder
@ 2013-02-18 9:12 ` Jeff King
2013-02-18 9:14 ` [PATCHv2 01/10] pkt-line: move a misplaced comment Jeff King
` (11 more replies)
0 siblings, 12 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:12 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Sun, Feb 17, 2013 at 05:41:13PM -0800, Jonathan Nieder wrote:
> > I don't think so. Don't ERR lines appear inside their own packets?
>
> Yes, I misread get_remote_heads for some reason. Thanks for checking.
Thanks for bringing it up. I had not even thought about ERR at all. So
it was luck rather than skill that I was right. :)
> I'm not sure whether servers are expected to send a flush after an
> ERR packet. The only codepath I know of in git itself that sends
> such packets is git-daemon, which does not flush after the error (but
> is not used in the stateless-rpc case). http-backend uses HTTP error
> codes for its errors.
I just checked, and GitHub also does not send flush packets after ERR.
Which makes sense; ERR is supposed to end the conversation. I can change
GitHub, of course, but who knows what other implementations exist (e.g.,
I do not know off-hand whether gitolite has custom ERR responses). So it
seems pretty clear that just checking for a flush packet is not the
right thing, and we need to actually parse the packet contents (at least
to some degree).
> If I am reading get_remote_heads correctly, calling it with the
> following tweak should work ok. The extra thread is just to feed a
> string into a fd-based interface and could be avoided for "list", too,
> if it costs too much.
Yeah, your patch does work, though we miss out on some of the refname
checks. I think what I'd rather do is just teach get_remote_heads to
read from a buffer (to avoid the extra thread and pipe), and then just
run (and cache) the ref parsing unconditionally once we've read from the
server. It shouldn't make a difference in the normal case, as we would
usually do a "list" anyway (and by caching, "list" can just feed out the
cached copy).
While looking into this, I noticed a bunch of other possible cleanups.
Patches to follow:
[01/10]: pkt-line: move a misplaced comment
[02/10]: pkt-line: drop safe_write function
[03/10]: pkt-line: clean up "gentle" reading function
[04/10]: pkt-line: change error message for oversized packet
[05/10]: pkt-line: rename s/packet_read_line/packet_read/
These are all just cleanups I noticed while looking at pkt-line. Any of
them can be dropped, though there would be some textual conflicts for
the later patches.
[06/10]: pkt-line: share buffer/descriptor reading implementation
[07/10]: teach get_remote_heads to read from a memory buffer
[08/10]: remote-curl: pass buffer straight to get_remote_heads
These all build on each other to get rid of the extra thread/pipe, which
I think is worth doing even without the rest of the series.
[09/10]: remote-curl: move ref-parsing code up in file
[10/10]: remote-curl: always parse incoming refs
And these ones actually fix the problem I noticed.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCHv2 01/10] pkt-line: move a misplaced comment
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
@ 2013-02-18 9:14 ` Jeff King
2013-02-18 9:15 ` [PATCHv2 02/10] pkt-line: drop safe_write function Jeff King
` (10 subsequent siblings)
11 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:14 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
The comment describing the packet writing interface was
originally written above packet_write, but migrated to be
above safe_write in f3a3214, probably because it is meant to
generally describe the packet writing interface and not a
single function. Let's move it into the header file, where
users of the interface are more likely to see it.
Signed-off-by: Jeff King <peff@peff.net>
---
I just left the comment intact as I moved it. It kind of implies to me
that you hand a big buffer to these functions and they would packetize
it for you, which is not true. I don't know if anybody else sees that;
it might be worth tweaking the text.
pkt-line.c | 15 ---------------
pkt-line.h | 14 +++++++++++++-
2 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/pkt-line.c b/pkt-line.c
index eaba15f..5138f47 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -46,21 +46,6 @@ static void packet_trace(const char *buf, unsigned int len, int write)
strbuf_release(&out);
}
-/*
- * Write a packetized stream, where each line is preceded by
- * its length (including the header) as a 4-byte hex number.
- * A length of 'zero' means end of stream (and a length of 1-3
- * would be an error).
- *
- * This is all pretty stupid, but we use this packetized line
- * format to make a streaming format possible without ever
- * over-running the read buffers. That way we'll never read
- * into what might be the pack data (which should go to another
- * process entirely).
- *
- * The writing side could use stdio, but since the reading
- * side can't, we stay with pure read/write interfaces.
- */
ssize_t safe_write(int fd, const void *buf, ssize_t n)
{
ssize_t nn = n;
diff --git a/pkt-line.h b/pkt-line.h
index 8cfeb0c..7a67e9c 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -5,7 +5,19 @@
#include "strbuf.h"
/*
- * Silly packetized line writing interface
+ * Write a packetized stream, where each line is preceded by
+ * its length (including the header) as a 4-byte hex number.
+ * A length of 'zero' means end of stream (and a length of 1-3
+ * would be an error).
+ *
+ * This is all pretty stupid, but we use this packetized line
+ * format to make a streaming format possible without ever
+ * over-running the read buffers. That way we'll never read
+ * into what might be the pack data (which should go to another
+ * process entirely).
+ *
+ * The writing side could use stdio, but since the reading
+ * side can't, we stay with pure read/write interfaces.
*/
void packet_flush(int fd);
void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 02/10] pkt-line: drop safe_write function
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
2013-02-18 9:14 ` [PATCHv2 01/10] pkt-line: move a misplaced comment Jeff King
@ 2013-02-18 9:15 ` Jeff King
2013-02-18 9:56 ` Jonathan Nieder
2013-02-18 9:16 ` [PATCHv2 03/10] pkt-line: clean up "gentle" reading function Jeff King
` (9 subsequent siblings)
11 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:15 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
This is just write_or_die by another name.
Signed-off-by: Jeff King <peff@peff.net>
---
Actually, they are not quite the same. write_or_die will exit(0) when it
sees EPIPE. Which makes me a little nervous.
builtin/receive-pack.c | 2 +-
builtin/send-pack.c | 2 +-
fetch-pack.c | 2 +-
http-backend.c | 8 ++++----
pkt-line.c | 21 ++-------------------
remote-curl.c | 4 ++--
send-pack.c | 2 +-
sideband.c | 9 +++++----
upload-pack.c | 3 ++-
9 files changed, 19 insertions(+), 34 deletions(-)
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e8878de..48cd5dc 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -908,7 +908,7 @@ static void report(struct command *commands, const char *unpack_status)
if (use_sideband)
send_sideband(1, 1, buf.buf, buf.len, use_sideband);
else
- safe_write(1, buf.buf, buf.len);
+ write_or_die(1, buf.buf, buf.len);
strbuf_release(&buf);
}
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 57a46b2..8778519 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -79,7 +79,7 @@ static void print_helper_status(struct ref *ref)
}
strbuf_addch(&buf, '\n');
- safe_write(1, buf.buf, buf.len);
+ write_or_die(1, buf.buf, buf.len);
}
strbuf_release(&buf);
}
diff --git a/fetch-pack.c b/fetch-pack.c
index 6d8926a..b90dadf 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -245,7 +245,7 @@ static void send_request(struct fetch_pack_args *args,
send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
packet_flush(fd);
} else
- safe_write(fd, buf->buf, buf->len);
+ write_or_die(fd, buf->buf, buf->len);
}
static void insert_one_alternate_ref(const struct ref *ref, void *unused)
diff --git a/http-backend.c b/http-backend.c
index f50e77f..8144f3a 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -70,7 +70,7 @@ static void format_write(int fd, const char *fmt, ...)
if (n >= sizeof(buffer))
die("protocol error: impossibly long line");
- safe_write(fd, buffer, n);
+ write_or_die(fd, buffer, n);
}
static void http_status(unsigned code, const char *msg)
@@ -111,7 +111,7 @@ static void end_headers(void)
static void end_headers(void)
{
- safe_write(1, "\r\n", 2);
+ write_or_die(1, "\r\n", 2);
}
__attribute__((format (printf, 1, 2)))
@@ -157,7 +157,7 @@ static void send_strbuf(const char *type, struct strbuf *buf)
hdr_int(content_length, buf->len);
hdr_str(content_type, type);
end_headers();
- safe_write(1, buf->buf, buf->len);
+ write_or_die(1, buf->buf, buf->len);
}
static void send_local_file(const char *the_type, const char *name)
@@ -185,7 +185,7 @@ static void send_local_file(const char *the_type, const char *name)
die_errno("Cannot read '%s'", p);
if (!n)
break;
- safe_write(1, buf, n);
+ write_or_die(1, buf, n);
}
close(fd);
free(buf);
diff --git a/pkt-line.c b/pkt-line.c
index 5138f47..699c2dd 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -46,23 +46,6 @@ static void packet_trace(const char *buf, unsigned int len, int write)
strbuf_release(&out);
}
-ssize_t safe_write(int fd, const void *buf, ssize_t n)
-{
- ssize_t nn = n;
- while (n) {
- int ret = xwrite(fd, buf, n);
- if (ret > 0) {
- buf = (char *) buf + ret;
- n -= ret;
- continue;
- }
- if (!ret)
- die("write error (disk full?)");
- die_errno("write error");
- }
- return nn;
-}
-
/*
* If we buffered things up above (we don't, but we should),
* we'd flush it here
@@ -70,7 +53,7 @@ void packet_flush(int fd)
void packet_flush(int fd)
{
packet_trace("0000", 4, 1);
- safe_write(fd, "0000", 4);
+ write_or_die(fd, "0000", 4);
}
void packet_buf_flush(struct strbuf *buf)
@@ -106,7 +89,7 @@ void packet_write(int fd, const char *fmt, ...)
va_start(args, fmt);
n = format_packet(fmt, args);
va_end(args);
- safe_write(fd, buffer, n);
+ write_or_die(fd, buffer, n);
}
void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
diff --git a/remote-curl.c b/remote-curl.c
index 933c69a..7be4b53 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -685,7 +685,7 @@ static int fetch_git(struct discovery *heads,
err = rpc_service(&rpc, heads);
if (rpc.result.len)
- safe_write(1, rpc.result.buf, rpc.result.len);
+ write_or_die(1, rpc.result.buf, rpc.result.len);
strbuf_release(&rpc.result);
strbuf_release(&preamble);
free(depth_arg);
@@ -805,7 +805,7 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
err = rpc_service(&rpc, heads);
if (rpc.result.len)
- safe_write(1, rpc.result.buf, rpc.result.len);
+ write_or_die(1, rpc.result.buf, rpc.result.len);
strbuf_release(&rpc.result);
free(argv);
return err;
diff --git a/send-pack.c b/send-pack.c
index 97ab336..a99a1e4 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -281,7 +281,7 @@ int send_pack(struct send_pack_args *args,
send_sideband(out, -1, req_buf.buf, req_buf.len, LARGE_PACKET_MAX);
}
} else {
- safe_write(out, req_buf.buf, req_buf.len);
+ write_or_die(out, req_buf.buf, req_buf.len);
packet_flush(out);
}
strbuf_release(&req_buf);
diff --git a/sideband.c b/sideband.c
index d5ffa1c..8f7b25b 100644
--- a/sideband.c
+++ b/sideband.c
@@ -1,3 +1,4 @@
+#include "cache.h"
#include "pkt-line.h"
#include "sideband.h"
@@ -108,7 +109,7 @@ int recv_sideband(const char *me, int in_stream, int out)
} while (len);
continue;
case 1:
- safe_write(out, buf + pf+1, len);
+ write_or_die(out, buf + pf+1, len);
continue;
default:
fprintf(stderr, "%s: protocol error: bad band #%d\n",
@@ -138,12 +139,12 @@ ssize_t send_sideband(int fd, int band, const char *data, ssize_t sz, int packet
if (0 <= band) {
sprintf(hdr, "%04x", n + 5);
hdr[4] = band;
- safe_write(fd, hdr, 5);
+ write_or_die(fd, hdr, 5);
} else {
sprintf(hdr, "%04x", n + 4);
- safe_write(fd, hdr, 4);
+ write_or_die(fd, hdr, 4);
}
- safe_write(fd, p, n);
+ write_or_die(fd, p, n);
p += n;
sz -= n;
}
diff --git a/upload-pack.c b/upload-pack.c
index 7c05b15..04d6bd7 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,7 +69,8 @@ static ssize_t send_client_data(int fd, const char *data, ssize_t sz)
xwrite(fd, data, sz);
return sz;
}
- return safe_write(fd, data, sz);
+ write_or_die(fd, data, sz);
+ return sz;
}
static FILE *pack_pipe = NULL;
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 03/10] pkt-line: clean up "gentle" reading function
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
2013-02-18 9:14 ` [PATCHv2 01/10] pkt-line: move a misplaced comment Jeff King
2013-02-18 9:15 ` [PATCHv2 02/10] pkt-line: drop safe_write function Jeff King
@ 2013-02-18 9:16 ` Jeff King
2013-02-18 10:12 ` Jonathan Nieder
2013-02-18 9:22 ` [PATCHv2 04/10] pkt-line: change error message for oversized packet Jeff King
` (8 subsequent siblings)
11 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:16 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
Originally we had a single function for reading packetized
data: packet_read_line. Commit 46284dd grew a more "gentle"
form that would return an error instead of dying upon
reading a truncated input stream. However:
1. The two functions were called "packet_read" and
"packet_read_line", with no indication that the only
difference is in the error handling.
2. There was no documentation about which error conditions
were handled in the gentle form, and which still caused
a death.
3. The internal variable to trigger the gentle mode was
called "return_line_fail", which was not very
expressive.
This patch converts packet_line to packet_read_line_gently
to more clearly indicate its relationship to
packet_read_line, and renames the internal variable to
"gently". This is also not incredibly expressive, but it is
at least a convention within the git code. And finally, we
document the exact behavior for the gentle and non-gentle
modes.
While we are cleaning up the names, we can drop the
"return_line_fail" checks in packet_read_internal entirely.
They look like this:
ret = safe_read(..., return_line_fail);
if (return_line_fail && ret < 0)
...
The check for return_line_fail is a no-op; safe_read will
only ever return an error value if we passed it
return_line_fail in the first place.
Signed-off-by: Jeff King <peff@peff.net>
---
Obviously this one is a matter of taste, but I think the result is much
better. Certainly the documentation bits are hard to argue with. :)
connect.c | 2 +-
pkt-line.c | 16 ++++++++--------
pkt-line.h | 21 ++++++++++++++++++++-
3 files changed, 29 insertions(+), 10 deletions(-)
diff --git a/connect.c b/connect.c
index 49e56ba..7e0920d 100644
--- a/connect.c
+++ b/connect.c
@@ -76,7 +76,7 @@ struct ref **get_remote_heads(int in, struct ref **list,
char *name;
int len, name_len;
- len = packet_read(in, buffer, sizeof(buffer));
+ len = packet_read_line_gently(in, buffer, sizeof(buffer));
if (len < 0)
die_initial_contact(got_at_least_one_head);
diff --git a/pkt-line.c b/pkt-line.c
index 699c2dd..62479d3 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -103,13 +103,13 @@ static int safe_read(int fd, void *buffer, unsigned size, int return_line_fail)
strbuf_add(buf, buffer, n);
}
-static int safe_read(int fd, void *buffer, unsigned size, int return_line_fail)
+static int safe_read(int fd, void *buffer, unsigned size, int gently)
{
ssize_t ret = read_in_full(fd, buffer, size);
if (ret < 0)
die_errno("read error");
else if (ret < size) {
- if (return_line_fail)
+ if (gently)
return -1;
die("The remote end hung up unexpectedly");
@@ -143,13 +143,13 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int return_
return len;
}
-static int packet_read_internal(int fd, char *buffer, unsigned size, int return_line_fail)
+static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
{
int len, ret;
char linelen[4];
- ret = safe_read(fd, linelen, 4, return_line_fail);
- if (return_line_fail && ret < 0)
+ ret = safe_read(fd, linelen, 4, gently);
+ if (ret < 0)
return ret;
len = packet_length(linelen);
if (len < 0)
@@ -161,15 +161,15 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int return_
len -= 4;
if (len >= size)
die("protocol error: bad line length %d", len);
- ret = safe_read(fd, buffer, len, return_line_fail);
- if (return_line_fail && ret < 0)
+ ret = safe_read(fd, buffer, len, gently);
+ if (ret < 0)
return ret;
buffer[len] = 0;
packet_trace(buffer, len, 0);
return len;
}
-int packet_read(int fd, char *buffer, unsigned size)
+int packet_read_line_gently(int fd, char *buffer, unsigned size)
{
return packet_read_internal(fd, buffer, size, 1);
}
diff --git a/pkt-line.h b/pkt-line.h
index 7a67e9c..31bd069 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -24,8 +24,27 @@ int packet_read_line(int fd, char *buffer, unsigned size);
void packet_buf_flush(struct strbuf *buf);
void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+/*
+ * Read a packetized line from the descriptor into the buffer. We will die
+ * under any of the following conditions:
+ *
+ * 1. Read error from descriptor.
+ *
+ * 2. Protocol error from the remote (e.g., bogus length characters).
+ *
+ * 3. Receiving a packet larger than "size" bytes.
+ *
+ * 4. Truncated output from the remote (e.g., we expected a packet but got
+ * EOF, or we got a partial packet followed by EOF).
+ */
int packet_read_line(int fd, char *buffer, unsigned size);
-int packet_read(int fd, char *buffer, unsigned size);
+
+/*
+ * Same as packet_read_line, but do not die on condition 4 (truncated input);
+ * instead return -1. We still die for the other conditions.
+ */
+int packet_read_line_gently(int fd, char *buffer, unsigned size);
+
int packet_get_line(struct strbuf *out, char **src_buf, size_t *src_len);
ssize_t safe_write(int, const void *, ssize_t);
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (2 preceding siblings ...)
2013-02-18 9:16 ` [PATCHv2 03/10] pkt-line: clean up "gentle" reading function Jeff King
@ 2013-02-18 9:22 ` Jeff King
2013-02-18 9:40 ` Junio C Hamano
2013-02-18 10:15 ` Jonathan Nieder
2013-02-18 9:22 ` [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/ Jeff King
` (7 subsequent siblings)
11 siblings, 2 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:22 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
If we get a packet from the remote side that is too large to
fit in our buffer, we currently complain "protocol error:
bad line length". This is a bit vague. The line length the
other side sent is not "bad" per se; the actual problem is
that it exceeded our expectation for buffer length.
This will generally not happen between two git-core
implementations, because the sender limits themselves to
either 1000, or to LARGE_PACKET_MAX (depending on what is
being sent, sideband-64k negotiation, etc), and the receiver
uses a buffer of the appropriate size.
The protocol document indicates the LARGE_PACKET_MAX limit
(of 65520), but does not actually specify the 1000-byte
limit for ref lines. It is likely that other implementations
just create a packet as large as they need, and this doesn't
come up often because nobody has 1000-character ref names
(or close to it, counting sha1 and other boilerplate).
We may want to increase the size of our receive buffers for
ref lines to prepare for such writers (whether they are
other implementations, or if we eventually want to bump the
write size in git-core). In the meantime, this patch tries
to give a more clear message in case it does come up.
Signed-off-by: Jeff King <peff@peff.net>
---
I'm really tempted to bump all of our 1000-byte buffers to just use
LARGE_PACKET_MAX. If we provided a packet_read variant that used a
static buffer (which is fine for all but one or two callers), then it
would not take much memory (right now we stick some LARGE_PACKET_MAX
buffers on the stack, which is slightly questionable for
stack-restricted systems). But I left that for a different topic (and
even if we do, we would still want this message to catch anything over
the bizarre 65520 limit).
Out of curiosity, I grepped the list archives, and found only one
instance of this message. And it was somebody whose data stream was tainted
with random crud that happened to be numbers (much more common is "bad line
length character", when the crud does not look like a packet length).
pkt-line.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/pkt-line.c b/pkt-line.c
index 62479d3..f2a7575 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -160,7 +160,8 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
}
len -= 4;
if (len >= size)
- die("protocol error: bad line length %d", len);
+ die("protocol error: line too large: (expected %u, got %d)",
+ size, len);
ret = safe_read(fd, buffer, len, gently);
if (ret < 0)
return ret;
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (3 preceding siblings ...)
2013-02-18 9:22 ` [PATCHv2 04/10] pkt-line: change error message for oversized packet Jeff King
@ 2013-02-18 9:22 ` Jeff King
2013-02-18 10:19 ` Jonathan Nieder
2013-02-18 9:26 ` [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation Jeff King
` (6 subsequent siblings)
11 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:22 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
Originally packets were used just for the line-oriented ref
advertisement and negotiation. These days, we also stuff
packfiles and sidebands into them, and they do not
necessarily represent a line. Drop the "_line" suffix, as it
is not informative and makes the function names quite long
(especially as we add "_gently" and other variants).
Signed-off-by: Jeff King <peff@peff.net>
---
Again, this is a taste issue. Can be optional.
builtin/archive.c | 4 ++--
builtin/fetch-pack.c | 2 +-
builtin/receive-pack.c | 2 +-
builtin/upload-archive.c | 2 +-
connect.c | 2 +-
daemon.c | 2 +-
fetch-pack.c | 6 +++---
pkt-line.c | 4 ++--
pkt-line.h | 6 +++---
remote-curl.c | 6 +++---
send-pack.c | 4 ++--
sideband.c | 2 +-
upload-pack.c | 4 ++--
13 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/builtin/archive.c b/builtin/archive.c
index 9a1cfd3..bf600f5 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -53,7 +53,7 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
- len = packet_read_line(fd[0], buf, sizeof(buf));
+ len = packet_read(fd[0], buf, sizeof(buf));
if (!len)
die(_("git archive: expected ACK/NAK, got EOF"));
if (buf[len-1] == '\n')
@@ -66,7 +66,7 @@ static int run_remote_archiver(int argc, const char **argv,
die(_("git archive: protocol error"));
}
- len = packet_read_line(fd[0], buf, sizeof(buf));
+ len = packet_read(fd[0], buf, sizeof(buf));
if (len)
die(_("git archive: expected a flush"));
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 940ae35..b59e60e 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -102,7 +102,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
*/
static char line[1000];
for (;;) {
- int n = packet_read_line(0, line, sizeof(line));
+ int n = packet_read(0, line, sizeof(line));
if (!n)
break;
if (line[n-1] == '\n')
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 48cd5dc..3f58ce8 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -736,7 +736,7 @@ static struct command *read_head_info(void)
char *refname;
int len, reflen;
- len = packet_read_line(0, line, sizeof(line));
+ len = packet_read(0, line, sizeof(line));
if (!len)
break;
if (line[len-1] == '\n')
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index b928beb..2ecf461 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -40,7 +40,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
sent_argv[0] = "git-upload-archive";
for (p = buf;;) {
/* This will die if not enough free space in buf */
- len = packet_read_line(0, p, (buf + sizeof buf) - p);
+ len = packet_read(0, p, (buf + sizeof buf) - p);
if (len == 0)
break; /* got a flush */
if (sent_argc > MAX_ARGS - 2)
diff --git a/connect.c b/connect.c
index 7e0920d..59266b1 100644
--- a/connect.c
+++ b/connect.c
@@ -76,7 +76,7 @@ struct ref **get_remote_heads(int in, struct ref **list,
char *name;
int len, name_len;
- len = packet_read_line_gently(in, buffer, sizeof(buffer));
+ len = packet_read_gently(in, buffer, sizeof(buffer));
if (len < 0)
die_initial_contact(got_at_least_one_head);
diff --git a/daemon.c b/daemon.c
index 4602b46..ac26b93 100644
--- a/daemon.c
+++ b/daemon.c
@@ -612,7 +612,7 @@ static int execute(void)
loginfo("Connection from %s:%s", addr, port);
alarm(init_timeout ? init_timeout : timeout);
- pktlen = packet_read_line(0, line, sizeof(line));
+ pktlen = packet_read(0, line, sizeof(line));
alarm(0);
len = strlen(line);
diff --git a/fetch-pack.c b/fetch-pack.c
index b90dadf..d89c9d0 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -173,7 +173,7 @@ static void consume_shallow_list(struct fetch_pack_args *args, int fd)
* is a block of have lines exchanged.
*/
char line[1000];
- while (packet_read_line(fd, line, sizeof(line))) {
+ while (packet_read(fd, line, sizeof(line))) {
if (!prefixcmp(line, "shallow "))
continue;
if (!prefixcmp(line, "unshallow "))
@@ -216,7 +216,7 @@ static enum ack_type get_ack(int fd, unsigned char *result_sha1)
static enum ack_type get_ack(int fd, unsigned char *result_sha1)
{
static char line[1000];
- int len = packet_read_line(fd, line, sizeof(line));
+ int len = packet_read(fd, line, sizeof(line));
if (!len)
die("git fetch-pack: expected ACK/NAK, got EOF");
@@ -350,7 +350,7 @@ static int find_common(struct fetch_pack_args *args,
unsigned char sha1[20];
send_request(args, fd[1], &req_buf);
- while (packet_read_line(fd[0], line, sizeof(line))) {
+ while (packet_read(fd[0], line, sizeof(line))) {
if (!prefixcmp(line, "shallow ")) {
if (get_sha1_hex(line + 8, sha1))
die("invalid shallow line: %s", line);
diff --git a/pkt-line.c b/pkt-line.c
index f2a7575..85faf73 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -170,12 +170,12 @@ int packet_read_line_gently(int fd, char *buffer, unsigned size)
return len;
}
-int packet_read_line_gently(int fd, char *buffer, unsigned size)
+int packet_read_gently(int fd, char *buffer, unsigned size)
{
return packet_read_internal(fd, buffer, size, 1);
}
-int packet_read_line(int fd, char *buffer, unsigned size)
+int packet_read(int fd, char *buffer, unsigned size)
{
return packet_read_internal(fd, buffer, size, 0);
}
diff --git a/pkt-line.h b/pkt-line.h
index 31bd069..2dc4941 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -37,13 +37,13 @@ int packet_read_line(int fd, char *buffer, unsigned size);
* 4. Truncated output from the remote (e.g., we expected a packet but got
* EOF, or we got a partial packet followed by EOF).
*/
-int packet_read_line(int fd, char *buffer, unsigned size);
+int packet_read(int fd, char *buffer, unsigned size);
/*
- * Same as packet_read_line, but do not die on condition 4 (truncated input);
+ * Same as packet_read, but do not die on condition 4 (truncated input);
* instead return -1. We still die for the other conditions.
*/
-int packet_read_line_gently(int fd, char *buffer, unsigned size);
+int packet_read_gently(int fd, char *buffer, unsigned size);
int packet_get_line(struct strbuf *out, char **src_buf, size_t *src_len);
ssize_t safe_write(int, const void *, ssize_t);
diff --git a/remote-curl.c b/remote-curl.c
index 7be4b53..99cc016 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -308,7 +308,7 @@ static size_t rpc_out(void *ptr, size_t eltsize,
if (!avail) {
rpc->initial_buffer = 0;
- avail = packet_read_line(rpc->out, rpc->buf, rpc->alloc);
+ avail = packet_read(rpc->out, rpc->buf, rpc->alloc);
if (!avail)
return 0;
rpc->pos = 0;
@@ -425,7 +425,7 @@ static int post_rpc(struct rpc_state *rpc)
break;
}
- n = packet_read_line(rpc->out, buf, left);
+ n = packet_read(rpc->out, buf, left);
if (!n)
break;
rpc->len += n;
@@ -579,7 +579,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
rpc->hdr_accept = strbuf_detach(&buf, NULL);
while (!err) {
- int n = packet_read_line(rpc->out, rpc->buf, rpc->alloc);
+ int n = packet_read(rpc->out, rpc->buf, rpc->alloc);
if (!n)
break;
rpc->pos = 0;
diff --git a/send-pack.c b/send-pack.c
index a99a1e4..7b22235 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -108,7 +108,7 @@ static int receive_status(int in, struct ref *refs)
struct ref *hint;
char line[1000];
int ret = 0;
- int len = packet_read_line(in, line, sizeof(line));
+ int len = packet_read(in, line, sizeof(line));
if (len < 10 || memcmp(line, "unpack ", 7))
return error("did not receive remote status");
if (memcmp(line, "unpack ok\n", 10)) {
@@ -122,7 +122,7 @@ static int receive_status(int in, struct ref *refs)
while (1) {
char *refname;
char *msg;
- len = packet_read_line(in, line, sizeof(line));
+ len = packet_read(in, line, sizeof(line));
if (!len)
break;
if (len < 3 ||
diff --git a/sideband.c b/sideband.c
index 8f7b25b..4a6446d 100644
--- a/sideband.c
+++ b/sideband.c
@@ -38,7 +38,7 @@ int recv_sideband(const char *me, int in_stream, int out)
while (1) {
int band, len;
- len = packet_read_line(in_stream, buf + pf, LARGE_PACKET_MAX);
+ len = packet_read(in_stream, buf + pf, LARGE_PACKET_MAX);
if (len == 0)
break;
if (len < 1) {
diff --git a/upload-pack.c b/upload-pack.c
index 04d6bd7..04231d7 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -428,7 +428,7 @@ static int get_common_commits(void)
save_commit_buffer = 0;
for (;;) {
- int len = packet_read_line(0, line, sizeof(line));
+ int len = packet_read(0, line, sizeof(line));
reset_timeout();
if (!len) {
@@ -589,7 +589,7 @@ static void receive_needs(void)
struct object *o;
const char *features;
unsigned char sha1_buf[20];
- len = packet_read_line(0, line, sizeof(line));
+ len = packet_read(0, line, sizeof(line));
reset_timeout();
if (!len)
break;
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (4 preceding siblings ...)
2013-02-18 9:22 ` [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/ Jeff King
@ 2013-02-18 9:26 ` Jeff King
2013-02-18 10:43 ` Jonathan Nieder
2013-02-18 9:27 ` [PATCHv2 07/10] teach get_remote_heads to read from a memory buffer Jeff King
` (5 subsequent siblings)
11 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:26 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
The packet_read function reads from a descriptor. The
packet_get_line function is similar, but reads from an
in-memory buffer, and uses a completely separate
implementation. This patch teaches the internal function
backing packet_read to accept either source, and use the
appropriate one. As a result:
1. The function has been renamed to packet_read_from_buf,
which more clearly indicates its relationship to the
packet_read function.
2. The original packet_get_line wrote to a strbuf; the new
function, like its descriptor counterpart, reads into a
buffer provided by the caller.
3. The original function did not die on any errors, but
instead returned an error code. Now we have the usual
"normal" and "gently" forms.
There are only two existing calls to packet_get_line which
have to be converted, and both are in remote-curl. The first
reads and checks the "# service=git-foo" line from a smart
http server. The second just reads past any additional
smart headers, without bothering to look at them.
This patch converts both to the new form, with a few
implications:
1. Because we use the non-gentle form, the first caller
can drop its own error checking. As a result, we will get
more accurate error reporting about protocol breakage,
since the errors come from inside the protocol code. We
will no longer print the URL as part of the error, but
that's OK. Protocol breakages should be rare (and we
are pretty sure at this point in the code that it is a
real smart server, so we won't be confused by dumb
servers), and the first debugging step would probably
be GIT_CURL_VERBOSE, anyway.
2. The second caller did not error check at all, and now
does. This can help us catch broken or truncated input
close to the source.
3. Since we are no longer using a strbuf, we now have a
1000-byte limit on the smart-http headers. That should
be fine, as the only header that has ever been sent
here is the short "service=git-foo" header.
Signed-off-by: Jeff King <peff@peff.net>
---
The diffstat shows more lines appearing, but it is mainly from comments
and from the various parse_line{_from_buf,}{,_gently} variants; we
really do get rid of a duplicate parsing implementation, and we
harmonize all of the error conditions and messages.
We can also make "gently" a parameter to avoid the proliferation of
related functions, but would mean all but one callsite would have to
pass an extra "0". Choose your poison, I guess.
pkt-line.c | 69 ++++++++++++++++++++++++++++-------------------------------
pkt-line.h | 11 +++++++++-
remote-curl.c | 19 ++++++++--------
3 files changed, 53 insertions(+), 46 deletions(-)
diff --git a/pkt-line.c b/pkt-line.c
index 85faf73..7ee91e0 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -103,12 +103,26 @@ static int safe_read(int fd, void *buffer, unsigned size, int gently)
strbuf_add(buf, buffer, n);
}
-static int safe_read(int fd, void *buffer, unsigned size, int gently)
+static int get_packet_data(int fd, char **src_buf, size_t *src_size,
+ void *dst, unsigned size, int gently)
{
- ssize_t ret = read_in_full(fd, buffer, size);
- if (ret < 0)
- die_errno("read error");
- else if (ret < size) {
+ ssize_t ret;
+
+ /* Read up to "size" bytes from our source, whatever it is. */
+ if (src_buf) {
+ ret = size < *src_size ? size : *src_size;
+ memcpy(dst, *src_buf, ret);
+ *src_buf += size;
+ *src_size -= size;
+ }
+ else {
+ ret = read_in_full(fd, dst, size);
+ if (ret < 0)
+ die_errno("read error");
+ }
+
+ /* And complain if we didn't get enough bytes to satisfy the read. */
+ if (ret < size) {
if (gently)
return -1;
@@ -143,12 +157,13 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
return len;
}
-static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
+static int packet_read_internal(int fd, char **src_buf, size_t *src_len,
+ char *buffer, unsigned size, int gently)
{
int len, ret;
char linelen[4];
- ret = safe_read(fd, linelen, 4, gently);
+ ret = get_packet_data(fd, src_buf, src_len, linelen, 4, gently);
if (ret < 0)
return ret;
len = packet_length(linelen);
@@ -162,7 +177,7 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
if (len >= size)
die("protocol error: line too large: (expected %u, got %d)",
size, len);
- ret = safe_read(fd, buffer, len, gently);
+ ret = get_packet_data(fd, src_buf, src_len, buffer, len, gently);
if (ret < 0)
return ret;
buffer[len] = 0;
@@ -172,40 +187,22 @@ int packet_get_line(struct strbuf *out,
int packet_read_gently(int fd, char *buffer, unsigned size)
{
- return packet_read_internal(fd, buffer, size, 1);
+ return packet_read_internal(fd, NULL, 0, buffer, size, 1);
}
int packet_read(int fd, char *buffer, unsigned size)
{
- return packet_read_internal(fd, buffer, size, 0);
+ return packet_read_internal(fd, NULL, 0, buffer, size, 0);
}
-int packet_get_line(struct strbuf *out,
- char **src_buf, size_t *src_len)
+int packet_read_from_buf(char *dst, unsigned dst_len,
+ char **src_buf, size_t *src_len)
{
- int len;
-
- if (*src_len < 4)
- return -1;
- len = packet_length(*src_buf);
- if (len < 0)
- return -1;
- if (!len) {
- *src_buf += 4;
- *src_len -= 4;
- packet_trace("0000", 4, 0);
- return 0;
- }
- if (*src_len < len)
- return -2;
-
- *src_buf += 4;
- *src_len -= 4;
- len -= 4;
+ return packet_read_internal(-1, src_buf, src_len, dst, dst_len, 0);
+}
- strbuf_add(out, *src_buf, len);
- *src_buf += len;
- *src_len -= len;
- packet_trace(out->buf, out->len, 0);
- return len;
+int packet_read_from_buf_gently(char *dst, unsigned dst_len,
+ char **src_buf, size_t *src_len)
+{
+ return packet_read_internal(-1, src_buf, src_len, dst, dst_len, 1);
}
diff --git a/pkt-line.h b/pkt-line.h
index 2dc4941..287a391 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -45,7 +45,16 @@ int packet_read_gently(int fd, char *buffer, unsigned size);
*/
int packet_read_gently(int fd, char *buffer, unsigned size);
-int packet_get_line(struct strbuf *out, char **src_buf, size_t *src_len);
+/*
+ * Same as packet_read above, but read from an in-memory buffer
+ * instead of a file descriptor. The src_buf and src_len are modified
+ * to iterate past the consumed data.
+ */
+int packet_read_from_buf(char *dst, unsigned dst_len,
+ char **src_buf, size_t *src_len);
+int packet_read_from_buf_gently(char *dst, unsigned dst_len,
+ char **src_buf, size_t *src_len);
+
ssize_t safe_write(int, const void *, ssize_t);
#endif
diff --git a/remote-curl.c b/remote-curl.c
index 99cc016..2ec5854 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -138,28 +138,29 @@ static struct discovery* discover_refs(const char *service)
if (maybe_smart &&
(5 <= last->len && last->buf[4] == '#') &&
!strbuf_cmp(&exp, &type)) {
+ char line[1000];
+ int len;
+
/*
* smart HTTP response; validate that the service
* pkt-line matches our request.
*/
- if (packet_get_line(&buffer, &last->buf, &last->len) <= 0)
- die("%s has invalid packet header", refs_url);
- if (buffer.len && buffer.buf[buffer.len - 1] == '\n')
- strbuf_setlen(&buffer, buffer.len - 1);
+ len = packet_read_from_buf(line, sizeof(line), &last->buf, &last->len);
+ if (len && line[len - 1] == '\n')
+ len--;
strbuf_reset(&exp);
strbuf_addf(&exp, "# service=%s", service);
- if (strbuf_cmp(&exp, &buffer))
- die("invalid server response; got '%s'", buffer.buf);
+ if (len != exp.len || memcmp(exp.buf, line, len))
+ die("invalid server response; got '%s'", line);
strbuf_release(&exp);
/* The header can include additional metadata lines, up
* until a packet flush marker. Ignore these now, but
* in the future we might start to scan them.
*/
- strbuf_reset(&buffer);
- while (packet_get_line(&buffer, &last->buf, &last->len) > 0)
- strbuf_reset(&buffer);
+ while (packet_read_from_buf(line, sizeof(line), &last->buf, &last->len) > 0)
+ ;
last->proto_git = 1;
}
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 07/10] teach get_remote_heads to read from a memory buffer
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (5 preceding siblings ...)
2013-02-18 9:26 ` [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation Jeff King
@ 2013-02-18 9:27 ` Jeff King
2013-02-18 9:29 ` [PATCHv2 08/10] remote-curl: pass buffer straight to get_remote_heads Jeff King
` (4 subsequent siblings)
11 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:27 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.
Signed-off-by: Jeff King <peff@peff.net>
---
There aren't that many callers of get_remote_heads, so I just added the
optional parameters and tweaked each callsite. We can do it as a
get_remote_heads_from_buf wrapper function and leave them be, if we
want.
builtin/fetch-pack.c | 2 +-
builtin/send-pack.c | 2 +-
cache.h | 4 +++-
connect.c | 10 +++++++---
remote-curl.c | 2 +-
transport.c | 6 +++---
6 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index b59e60e..bf850d0 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -128,7 +128,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
args.verbose ? CONNECT_VERBOSE : 0);
}
- get_remote_heads(fd[0], &ref, 0, NULL);
+ get_remote_heads(fd[0], NULL, 0, &ref, 0, NULL);
ref = fetch_pack(&args, fd, conn, ref, dest,
&sought, pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 8778519..152c4ea 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -207,7 +207,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
memset(&extra_have, 0, sizeof(extra_have));
- get_remote_heads(fd[0], &remote_refs, REF_NORMAL, &extra_have);
+ get_remote_heads(fd[0], NULL, 0, &remote_refs, REF_NORMAL, &extra_have);
transport_verify_remote_names(nr_refspecs, refspecs);
diff --git a/cache.h b/cache.h
index e493563..db646a2 100644
--- a/cache.h
+++ b/cache.h
@@ -1049,7 +1049,9 @@ struct extra_have_objects {
int nr, alloc;
unsigned char (*array)[20];
};
-extern struct ref **get_remote_heads(int in, struct ref **list, unsigned int flags, struct extra_have_objects *);
+extern struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+ struct ref **list, unsigned int flags,
+ struct extra_have_objects *);
extern int server_supports(const char *feature);
extern int parse_feature_request(const char *features, const char *feature);
extern const char *server_feature_value(const char *feature, int *len_ret);
diff --git a/connect.c b/connect.c
index 59266b1..28f21d0 100644
--- a/connect.c
+++ b/connect.c
@@ -62,8 +62,8 @@ static void die_initial_contact(int got_at_least_one_head)
/*
* Read all the refs from the other end
*/
-struct ref **get_remote_heads(int in, struct ref **list,
- unsigned int flags,
+struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+ struct ref **list, unsigned int flags,
struct extra_have_objects *extra_have)
{
int got_at_least_one_head = 0;
@@ -76,7 +76,11 @@ struct ref **get_remote_heads(int in, struct ref **list,
char *name;
int len, name_len;
- len = packet_read_gently(in, buffer, sizeof(buffer));
+ if (src_buf)
+ len = packet_read_from_buf_gently(buffer, sizeof(buffer),
+ &src_buf, &src_len);
+ else
+ len = packet_read_gently(in, buffer, sizeof(buffer));
if (len < 0)
die_initial_contact(got_at_least_one_head);
diff --git a/remote-curl.c b/remote-curl.c
index 2ec5854..6e43463 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -195,7 +195,7 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
if (start_async(&async))
die("cannot start thread to parse advertised refs");
- get_remote_heads(async.out, &list,
+ get_remote_heads(async.out, NULL, 0, &list,
for_push ? REF_NORMAL : 0, NULL);
close(async.out);
if (finish_async(&async))
diff --git a/transport.c b/transport.c
index 886ffd8..62df466 100644
--- a/transport.c
+++ b/transport.c
@@ -507,7 +507,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
struct ref *refs;
connect_setup(transport, for_push, 0);
- get_remote_heads(data->fd[0], &refs,
+ get_remote_heads(data->fd[0], NULL, 0, &refs,
for_push ? REF_NORMAL : 0, &data->extra_have);
data->got_remote_heads = 1;
@@ -541,7 +541,7 @@ static int fetch_refs_via_pack(struct transport *transport,
if (!data->got_remote_heads) {
connect_setup(transport, 0, 0);
- get_remote_heads(data->fd[0], &refs_tmp, 0, NULL);
+ get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0, NULL);
data->got_remote_heads = 1;
}
@@ -799,7 +799,7 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
struct ref *tmp_refs;
connect_setup(transport, 1, 0);
- get_remote_heads(data->fd[0], &tmp_refs, REF_NORMAL, NULL);
+ get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL, NULL);
data->got_remote_heads = 1;
}
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 08/10] remote-curl: pass buffer straight to get_remote_heads
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (6 preceding siblings ...)
2013-02-18 9:27 ` [PATCHv2 07/10] teach get_remote_heads to read from a memory buffer Jeff King
@ 2013-02-18 9:29 ` Jeff King
2013-02-18 10:47 ` Jonathan Nieder
2013-02-18 9:29 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Junio C Hamano
` (3 subsequent siblings)
11 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:29 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
Until recently, get_remote_heads only knew how to read refs
from a file descriptor. To hack around this, we spawned a
thread (or forked a process) to write the buffer back to us.
Now that we can just pass it our buffer directly, we don't
have to use this hack anymore.
Signed-off-by: Jeff King <peff@peff.net>
---
I don't know that this code was hurting anything, but it has always
struck me as ugly and a possible source of error. And now it's gone.
remote-curl.c | 26 ++------------------------
1 file changed, 2 insertions(+), 24 deletions(-)
diff --git a/remote-curl.c b/remote-curl.c
index 6e43463..f049da2 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -173,33 +173,11 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
return last;
}
-static int write_discovery(int in, int out, void *data)
-{
- struct discovery *heads = data;
- int err = 0;
- if (write_in_full(out, heads->buf, heads->len) != heads->len)
- err = 1;
- close(out);
- return err;
-}
-
static struct ref *parse_git_refs(struct discovery *heads, int for_push)
{
struct ref *list = NULL;
- struct async async;
-
- memset(&async, 0, sizeof(async));
- async.proc = write_discovery;
- async.data = heads;
- async.out = -1;
-
- if (start_async(&async))
- die("cannot start thread to parse advertised refs");
- get_remote_heads(async.out, NULL, 0, &list,
- for_push ? REF_NORMAL : 0, NULL);
- close(async.out);
- if (finish_async(&async))
- die("ref parsing thread failed");
+ get_remote_heads(-1, heads->buf, heads->len, &list,
+ for_push ? REF_NORMAL : 0, NULL);
return list;
}
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (7 preceding siblings ...)
2013-02-18 9:29 ` [PATCHv2 08/10] remote-curl: pass buffer straight to get_remote_heads Jeff King
@ 2013-02-18 9:29 ` Junio C Hamano
2013-02-18 9:33 ` Jeff King
2013-02-18 9:29 ` [PATCHv2 09/10] remote-curl: move ref-parsing code up in file Jeff King
` (2 subsequent siblings)
11 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2013-02-18 9:29 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, Shawn O. Pearce
Jeff King <peff@peff.net> writes:
> On Sun, Feb 17, 2013 at 05:41:13PM -0800, Jonathan Nieder wrote:
>
>> > I don't think so. Don't ERR lines appear inside their own packets?
>>
>> Yes, I misread get_remote_heads for some reason. Thanks for checking.
>
> Thanks for bringing it up. I had not even thought about ERR at all. So
> it was luck rather than skill that I was right. :)
>
>> I'm not sure whether servers are expected to send a flush after an
>> ERR packet. The only codepath I know of in git itself that sends
>> such packets is git-daemon, which does not flush after the error (but
>> is not used in the stateless-rpc case). http-backend uses HTTP error
>> codes for its errors.
>
> I just checked, and GitHub also does not send flush packets after ERR.
> Which makes sense; ERR is supposed to end the conversation.
Hmph. A flush packet was supposed to be a mark to say "all the
packets before this one can be buffered and kept without getting
passed to write(2), but this and all such buffered data _must_ go on
the wire _now_". So in the sense, ERR not followed by a flush may
not even have a chance to be seen on the other end, no? That is
what the comment before the implementation of packet_flush() is all
about.
^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCHv2 09/10] remote-curl: move ref-parsing code up in file
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (8 preceding siblings ...)
2013-02-18 9:29 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Junio C Hamano
@ 2013-02-18 9:29 ` Jeff King
2013-02-18 9:30 ` [PATCHv2 10/10] remote-curl: always parse incoming refs Jeff King
2013-02-20 7:05 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Shawn Pearce
11 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:29 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
The ref-parsing functions are static. Let's move them up in
the file to be available to more functions, which will help
us with later refactoring.
Signed-off-by: Jeff King <peff@peff.net>
---
Just a cleanup for the next patch. We could also just do extra
declarations at the top.
remote-curl.c | 117 +++++++++++++++++++++++++++++-----------------------------
1 file changed, 59 insertions(+), 58 deletions(-)
diff --git a/remote-curl.c b/remote-curl.c
index f049da2..62f82d1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -80,6 +80,65 @@ static struct discovery *last_discovery;
};
static struct discovery *last_discovery;
+static struct ref *parse_git_refs(struct discovery *heads, int for_push)
+{
+ struct ref *list = NULL;
+ get_remote_heads(-1, heads->buf, heads->len, &list,
+ for_push ? REF_NORMAL : 0, NULL);
+ return list;
+}
+
+static struct ref *parse_info_refs(struct discovery *heads)
+{
+ char *data, *start, *mid;
+ char *ref_name;
+ int i = 0;
+
+ struct ref *refs = NULL;
+ struct ref *ref = NULL;
+ struct ref *last_ref = NULL;
+
+ data = heads->buf;
+ start = NULL;
+ mid = data;
+ while (i < heads->len) {
+ if (!start) {
+ start = &data[i];
+ }
+ if (data[i] == '\t')
+ mid = &data[i];
+ if (data[i] == '\n') {
+ if (mid - start != 40)
+ die("%sinfo/refs not valid: is this a git repository?", url);
+ data[i] = 0;
+ ref_name = mid + 1;
+ ref = xmalloc(sizeof(struct ref) +
+ strlen(ref_name) + 1);
+ memset(ref, 0, sizeof(struct ref));
+ strcpy(ref->name, ref_name);
+ get_sha1_hex(start, ref->old_sha1);
+ if (!refs)
+ refs = ref;
+ if (last_ref)
+ last_ref->next = ref;
+ last_ref = ref;
+ start = NULL;
+ }
+ i++;
+ }
+
+ ref = alloc_ref("HEAD");
+ if (!http_fetch_ref(url, ref) &&
+ !resolve_remote_symref(ref, refs)) {
+ ref->next = refs;
+ refs = ref;
+ } else {
+ free(ref);
+ }
+
+ return refs;
+}
+
static void free_discovery(struct discovery *d)
{
if (d) {
@@ -173,64 +232,6 @@ static struct discovery* discover_refs(const char *service)
return last;
}
-static struct ref *parse_git_refs(struct discovery *heads, int for_push)
-{
- struct ref *list = NULL;
- get_remote_heads(-1, heads->buf, heads->len, &list,
- for_push ? REF_NORMAL : 0, NULL);
- return list;
-}
-
-static struct ref *parse_info_refs(struct discovery *heads)
-{
- char *data, *start, *mid;
- char *ref_name;
- int i = 0;
-
- struct ref *refs = NULL;
- struct ref *ref = NULL;
- struct ref *last_ref = NULL;
-
- data = heads->buf;
- start = NULL;
- mid = data;
- while (i < heads->len) {
- if (!start) {
- start = &data[i];
- }
- if (data[i] == '\t')
- mid = &data[i];
- if (data[i] == '\n') {
- if (mid - start != 40)
- die("%sinfo/refs not valid: is this a git repository?", url);
- data[i] = 0;
- ref_name = mid + 1;
- ref = xmalloc(sizeof(struct ref) +
- strlen(ref_name) + 1);
- memset(ref, 0, sizeof(struct ref));
- strcpy(ref->name, ref_name);
- get_sha1_hex(start, ref->old_sha1);
- if (!refs)
- refs = ref;
- if (last_ref)
- last_ref->next = ref;
- last_ref = ref;
- start = NULL;
- }
- i++;
- }
-
- ref = alloc_ref("HEAD");
- if (!http_fetch_ref(url, ref) &&
- !resolve_remote_symref(ref, refs)) {
- ref->next = refs;
- refs = ref;
- } else {
- free(ref);
- }
-
- return refs;
-}
static struct ref *get_refs(int for_push)
{
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCHv2 10/10] remote-curl: always parse incoming refs
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (9 preceding siblings ...)
2013-02-18 9:29 ` [PATCHv2 09/10] remote-curl: move ref-parsing code up in file Jeff King
@ 2013-02-18 9:30 ` Jeff King
2013-02-18 10:50 ` Jonathan Nieder
2013-02-20 7:05 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Shawn Pearce
11 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:30 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
When remote-curl receives a list of refs from a server, it
keeps the whole buffer intact. When we get a "list" command,
we feed the result to get_remote_heads, and when we get a
"fetch" or "push" command, we feed it to fetch-pack or
send-pack, respectively.
If the HTTP response from the server is truncated for any
reason, we will get an incomplete ref advertisement. If we
then feed this incomplete list to fetch-pack, one of a few
things may happen:
1. If the truncation is in a packet header, fetch-pack
will notice the bogus line and complain.
2. If the truncation is inside a packet, fetch-pack will
keep waiting for us to send the rest of the packet,
which we never will.
3. If the truncation is at a packet boundary, fetch-pack
will keep waiting for us to send the next packet, which
we never will.
As a result, fetch-pack hangs, waiting for input. However,
remote-curl believes it has sent all of the advertisement,
and therefore waits for fetch-pack to speak. The two
processes end up in a deadlock.
We do notice the broken ref list if we feed it to
get_remote_heads. So if git asks the helper to do a "list"
followed by a "fetch", we are safe; we'll abort during the
list operation, which parses the refs.
This patch teaches remote-curl to always parse and save the
incoming ref list when we read the ref advertisement from a
server. That means that we will always verify and abort
before even running fetch-pack (or send-pack) when reading a
corrupted list, even if we do not run the "list" command
explicitly.
Since we save the result, in the common case of running
"list" then "fetch", we do not do any extra parsing at all.
In the case of just a "fetch", we do an extra round of
parsing, but only once.
Note also that the "fetch" case will now also initialize
server_capabilities from the remote (in remote-curl; we
already would do so inside fetch-pack). Doing "list+fetch"
already does this. It doesn't actually matter now, but the
new behavior is arguably more correct, should remote-curl
ever start caring about the server's capability list.
Signed-off-by: Jeff King <peff@peff.net>
---
And this does the equivalent of patch 3/3 from the first series, but I
think this is much more robust (certainly it solves the ERR problem, but
more importantly, it uses the exact same function that other code paths
do, so we do not have to worry about it diverging).
remote-curl.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/remote-curl.c b/remote-curl.c
index 62f82d1..33af198 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -76,6 +76,7 @@ struct discovery {
char *buf_alloc;
char *buf;
size_t len;
+ struct ref *refs;
unsigned proto_git : 1;
};
static struct discovery *last_discovery;
@@ -145,11 +146,12 @@ static void free_discovery(struct discovery *d)
if (d == last_discovery)
last_discovery = NULL;
free(d->buf_alloc);
+ free_refs(d->refs);
free(d);
}
}
-static struct discovery* discover_refs(const char *service)
+static struct discovery* discover_refs(const char *service, int for_push)
{
struct strbuf exp = STRBUF_INIT;
struct strbuf type = STRBUF_INIT;
@@ -224,6 +226,11 @@ static struct discovery* discover_refs(const char *service)
last->proto_git = 1;
}
+ if (last->proto_git)
+ last->refs = parse_git_refs(last, for_push);
+ else
+ last->refs = parse_info_refs(last);
+
free(refs_url);
strbuf_release(&exp);
strbuf_release(&type);
@@ -232,19 +239,16 @@ static struct ref *get_refs(int for_push)
return last;
}
-
static struct ref *get_refs(int for_push)
{
struct discovery *heads;
if (for_push)
- heads = discover_refs("git-receive-pack");
+ heads = discover_refs("git-receive-pack", for_push);
else
- heads = discover_refs("git-upload-pack");
+ heads = discover_refs("git-upload-pack", for_push);
- if (heads->proto_git)
- return parse_git_refs(heads, for_push);
- return parse_info_refs(heads);
+ return heads->refs;
}
static void output_refs(struct ref *refs)
@@ -258,7 +262,6 @@ static void output_refs(struct ref *refs)
}
printf("\n");
fflush(stdout);
- free_refs(refs);
}
struct rpc_state {
@@ -674,7 +677,7 @@ static int fetch(int nr_heads, struct ref **to_fetch)
static int fetch(int nr_heads, struct ref **to_fetch)
{
- struct discovery *d = discover_refs("git-upload-pack");
+ struct discovery *d = discover_refs("git-upload-pack", 0);
if (d->proto_git)
return fetch_git(d, nr_heads, to_fetch);
else
@@ -793,7 +796,7 @@ static int push(int nr_spec, char **specs)
static int push(int nr_spec, char **specs)
{
- struct discovery *heads = discover_refs("git-receive-pack");
+ struct discovery *heads = discover_refs("git-receive-pack", 1);
int ret;
if (heads->proto_git)
--
1.8.1.20.g7078b03
^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 9:29 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Junio C Hamano
@ 2013-02-18 9:33 ` Jeff King
2013-02-18 9:49 ` Junio C Hamano
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 01:29:16AM -0800, Junio C Hamano wrote:
> > I just checked, and GitHub also does not send flush packets after ERR.
> > Which makes sense; ERR is supposed to end the conversation.
>
> Hmph. A flush packet was supposed to be a mark to say "all the
> packets before this one can be buffered and kept without getting
> passed to write(2), but this and all such buffered data _must_ go on
> the wire _now_". So in the sense, ERR not followed by a flush may
> not even have a chance to be seen on the other end, no? That is
> what the comment before the implementation of packet_flush() is all
> about.
Despite the name, I always thought of packet_flush as more of a signal
for the _receiver_, who then knows that they have gotten all of a
particular list. In other words, we seem to use it as a sequence point
as much as anything (mostly because we immediately write() out any other
packets anyway, so there is no flushing to be done; but perhaps there
were originally thoughts that we could do more buffering on the writing
side).
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 9:22 ` [PATCHv2 04/10] pkt-line: change error message for oversized packet Jeff King
@ 2013-02-18 9:40 ` Junio C Hamano
2013-02-18 9:49 ` Jeff King
2013-02-18 10:15 ` Jonathan Nieder
1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2013-02-18 9:40 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, Shawn O. Pearce
Jeff King <peff@peff.net> writes:
> I'm really tempted to bump all of our 1000-byte buffers to just use
> LARGE_PACKET_MAX. If we provided a packet_read variant that used a
> static buffer (which is fine for all but one or two callers), then it
> would not take much memory...
I thought that 1000-byte limit was kept when we introduced the 64k
interface to interoperate with other side that does not yet support
the 64k interface. Is your justification that such an old version of
Git no longer matters in the real world (which is true, I think), or
we use 1000-byte limit in some codepaths even when we know that we
are talking with a 64k-capable version of Git on the other side?
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 9:33 ` Jeff King
@ 2013-02-18 9:49 ` Junio C Hamano
2013-02-18 9:55 ` Jeff King
2013-02-20 7:14 ` Shawn Pearce
0 siblings, 2 replies; 50+ messages in thread
From: Junio C Hamano @ 2013-02-18 9:49 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, Shawn O. Pearce
Jeff King <peff@peff.net> writes:
> On Mon, Feb 18, 2013 at 01:29:16AM -0800, Junio C Hamano wrote:
>
>> > I just checked, and GitHub also does not send flush packets after ERR.
>> > Which makes sense; ERR is supposed to end the conversation.
>>
>> Hmph. A flush packet was supposed to be a mark to say "all the
>> packets before this one can be buffered and kept without getting
>> passed to write(2), but this and all such buffered data _must_ go on
>> the wire _now_". So in the sense, ERR not followed by a flush may
>> not even have a chance to be seen on the other end, no? That is
>> what the comment before the implementation of packet_flush() is all
>> about.
>
> Despite the name, I always thought of packet_flush as more of a signal
> for the _receiver_, who then knows that they have gotten all of a
> particular list. In other words, we seem to use it as a sequence point
> as much as anything (mostly because we immediately write() out any other
> packets anyway, so there is no flushing to be done; but perhaps there
> were originally thoughts that we could do more buffering on the writing
> side).
Exactly.
The logic behind the comment "we do not buffer, but we should" is
that flush tells the receiver that the sending end is "done" feeding
it a class of data, and the data before flush do not have to reach
the receiver immediately, hence we can afford to buffer on the
sending end if we can measure that doing so would give us better
performance. We haven't measure and turned buffering on yet.
But when dying, we may of course have data before flushing. We may
disconnect (by dying) without showing flush (or preceding ERR) to
the other side, and for that reason, not relying on the flush packet
on the receiving end is a good change. Once we turn buffering on, we
probably need to be careful when sending an ERR indicator by making
it always drain any buffered data and show the ERR indicator without
buffering, or something.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 9:40 ` Junio C Hamano
@ 2013-02-18 9:49 ` Jeff King
2013-02-18 21:25 ` Junio C Hamano
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 01:40:17AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > I'm really tempted to bump all of our 1000-byte buffers to just use
> > LARGE_PACKET_MAX. If we provided a packet_read variant that used a
> > static buffer (which is fine for all but one or two callers), then it
> > would not take much memory...
>
> I thought that 1000-byte limit was kept when we introduced the 64k
> interface to interoperate with other side that does not yet support
> the 64k interface. Is your justification that such an old version of
> Git no longer matters in the real world (which is true, I think), or
> we use 1000-byte limit in some codepaths even when we know that we
> are talking with a 64k-capable version of Git on the other side?
I should have been more clear that I want to bump only the _reading_
side, not the writing.
The sideband-64k capability bumps the sideband chunk size. But the size
for packetized ref advertisement lines has remained at 1000. Which it
must, since we start outputting them before doing capability
negotiation. The sender will die before writing a longer ref line (see
pkg-line.c:format_packet), and most of the reading callsites feed a
1000-byte buffer to packet_read_line (which will die if we get a larger
packet). So the upgrade path for that would be:
1. Git bumps up all reading buffers to LARGE_PACKET_MAX, just in case.
This immediately covers us for any alternate implementations that
send larger ref packets (I do not know if any exist, but the
documentation does not mention this limitation at all, so I would
not be surprised if other implementations just use LARGE_PACKET_MAX
as a maximum).
2. Many years pass. We decide that Git v1.8.2 and older are ancient
history and not worth caring about.
3. We bump the 1000-byte limit for format_packet to LARGE_PACKET_MAX.
This is not pressing at all; I wouldn't have even noticed it if I hadn't
been wondering how large to make the new buffer I was adding in a later
patch. I have not heard of anyone running up against this limit in
practice. But it's easy to do (1), and it starts the clock ticking for
the 1000-byte readers to become obsolete.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 9:49 ` Junio C Hamano
@ 2013-02-18 9:55 ` Jeff King
2013-02-20 7:14 ` Shawn Pearce
1 sibling, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 9:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 01:49:37AM -0800, Junio C Hamano wrote:
> The logic behind the comment "we do not buffer, but we should" is
> that flush tells the receiver that the sending end is "done" feeding
> it a class of data, and the data before flush do not have to reach
> the receiver immediately, hence we can afford to buffer on the
> sending end if we can measure that doing so would give us better
> performance. We haven't measure and turned buffering on yet.
>
> But when dying, we may of course have data before flushing. We may
> disconnect (by dying) without showing flush (or preceding ERR) to
> the other side, and for that reason, not relying on the flush packet
> on the receiving end is a good change. Once we turn buffering on, we
> probably need to be careful when sending an ERR indicator by making
> it always drain any buffered data and show the ERR indicator without
> buffering, or something.
Yeah, I'd agree. An ERR packet should always be accompanied by a flush
(whether an actual flush packet or not doesn't really matter, but
definitely a buffer flush on the sender). But I think that is really the
sender's problem, and we can deal with it there if and when we buffer.
>From the receiver's end, they simply want to be liberal in interpreting
an ERR as the end of the data stream. They do not have to be picky about
whether it is followed by more data, or by a flush packet, or whatever.
But that is not a change I am introducing; that is how get_remote_heads
has always worked. I am merely correcting the proposed change from v1 of
the series that did not correctly take that into account (and therefore
regressed the error message in the ERR case).
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 02/10] pkt-line: drop safe_write function
2013-02-18 9:15 ` [PATCHv2 02/10] pkt-line: drop safe_write function Jeff King
@ 2013-02-18 9:56 ` Jonathan Nieder
2013-02-18 10:24 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 9:56 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> This is just write_or_die by another name.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Actually, they are not quite the same. write_or_die will exit(0) when it
> sees EPIPE.
That information definitely belongs in the commit message.
> Which makes me a little nervous.
Me, too. Let's see:
[...]
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -908,7 +908,7 @@ static void report(struct command *commands, const char *unpack_status)
> if (use_sideband)
> send_sideband(1, 1, buf.buf, buf.len, use_sideband);
> else
> - safe_write(1, buf.buf, buf.len);
> + write_or_die(1, buf.buf, buf.len);
If the connection to send-pack is lost and stdout becomes a broken
pipe and I am updating enough refs to overflow the pipe buffer,
receive-pack will die with SIGPIPE. So unless the sadistic caller has
set the inherited SIGPIPE action to SIG_IGN (for example by wrapping
git with an uncautious Python wrapper that uses subprocess.Popen), the
change to EPIPE handling is not a behavior change.
Since the pipe is closed, presumably the calling send-pack has hung up
and won't notice the exit status, so this should be safe.
Arguably it would be more friendly to stay alive to run the
post-receive and post-update hooks, though, given that a ref update
has occurred. Maybe transport commands like this one should always
set the disposition of SIGPIPE to SIG_IGN.
[...]
> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -79,7 +79,7 @@ static void print_helper_status(struct ref *ref)
> }
> strbuf_addch(&buf, '\n');
>
> - safe_write(1, buf.buf, buf.len);
> + write_or_die(1, buf.buf, buf.len);
A signal will kill send-pack before write_or_die has a chance to
intervene so this change is a no-op unless the caller is sadistic
(as in the [1] case). In the signal(SIGPIPE, SIG_IGN) case, it might
be a regression, since "git push" should not declare success when its
connection to receive-pack closes early.
[1] http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html
[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -245,7 +245,7 @@ static void send_request(struct fetch_pack_args *args,
> send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
> packet_flush(fd);
> } else
> - safe_write(fd, buf->buf, buf->len);
> + write_or_die(fd, buf->buf, buf->len);
Also a no-op except when the parent process is insane enough to let us
inherit signal(SIGPIPE, SIG_IGN).
In that case, if triggerable this looks like a bad change: if
upload-pack has gone missing, the fetch should not be considered a
success.
[...]
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -70,7 +70,7 @@ static void format_write(int fd, const char *fmt, ...)
> if (n >= sizeof(buffer))
> die("protocol error: impossibly long line");
>
> - safe_write(fd, buffer, n);
> + write_or_die(fd, buffer, n);
Etc. I'm stopping here.
I'm thinking before a patch like this we should make the following
change:
1. at startup, set the signal action of SIGPIPE to SIG_DFL, to make
the behavior a little more predictable.
Perhaps the following as well:
2. in write_or_die(), when encountering EPIPE, set the signal action
of SIGPIPE to SIG_DFL and raise(SIGPIPE), ensuring the exit status
reflects the broken pipe. If the parent process is unnecessarily
noisy about that, that's a bug in the parent process (hopefully
uncommon).
Or alternatively:
2b. never set SIGPIPE to SIG_IGN except in short blocks of code that
do not call write_or_die()
What do you think?
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 03/10] pkt-line: clean up "gentle" reading function
2013-02-18 9:16 ` [PATCHv2 03/10] pkt-line: clean up "gentle" reading function Jeff King
@ 2013-02-18 10:12 ` Jonathan Nieder
2013-02-18 10:25 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:12 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> Originally we had a single function for reading packetized
> data: packet_read_line. Commit 46284dd grew a more "gentle"
> form that would return an error instead of dying upon
> reading a truncated input stream. However:
In other words:
Based on the names of two functions "packet_read" and
"packet_read_line", it is not obvious which to use and what the
ramifications of that choice are.
Rename packet_read to packet_read_line_gently and add a comment
explaining that the latter is a "gentler" form that returns an
error instead of dying upon reading a truncated input stream.
While at it:
* Rename the internal argument triggering the gentle mode to
"gentle" instead of "return_line_fail".
* Drop the redundant "return_line_fail &&" in checks like
"if (return_line_fail && ret < 0)". safe_read() never
returns an error when !gentle.
No functional change intended.
FWIW, the patch itself is
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 9:22 ` [PATCHv2 04/10] pkt-line: change error message for oversized packet Jeff King
2013-02-18 9:40 ` Junio C Hamano
@ 2013-02-18 10:15 ` Jonathan Nieder
2013-02-18 10:26 ` Jeff King
1 sibling, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:15 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -160,7 +160,8 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
> }
> len -= 4;
> if (len >= size)
> - die("protocol error: bad line length %d", len);
> + die("protocol error: line too large: (expected %u, got %d)",
> + size, len);
Makes sense. I think this should say "expected < %u, got %d", since we
don't actually expect most lines to be 1004 bytes in practice.
With or without such a change,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/
2013-02-18 9:22 ` [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/ Jeff King
@ 2013-02-18 10:19 ` Jonathan Nieder
2013-02-18 10:29 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:19 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> Originally packets were used just for the line-oriented ref
> advertisement and negotiation. These days, we also stuff
> packfiles and sidebands into them, and they do not
> necessarily represent a line. Drop the "_line" suffix, as it
> is not informative and makes the function names quite long
> (especially as we add "_gently" and other variants).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Again, this is a taste issue. Can be optional.
In combination with patch 3, this changes the meaning of packet_read()
without changing its signature, which could make other patches
cherry-picked on top change behavior in unpredictable ways. :(
So I'd be all for this if the signature changes (for example to put
the fd at the end or something), but not so if not.
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 02/10] pkt-line: drop safe_write function
2013-02-18 9:56 ` Jonathan Nieder
@ 2013-02-18 10:24 ` Jeff King
0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 10:24 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 01:56:45AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > This is just write_or_die by another name.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > Actually, they are not quite the same. write_or_die will exit(0) when it
> > sees EPIPE.
>
> That information definitely belongs in the commit message.
Yeah, this one was more RFC; I was hoping for some input on the EPIPE
thing, so I could know _what_ to put in the commit message. Is it safe,
and if so, why? And if not, we should drop the patch.
> If the connection to send-pack is lost and stdout becomes a broken
> pipe and I am updating enough refs to overflow the pipe buffer,
> receive-pack will die with SIGPIPE. So unless the sadistic caller has
> set the inherited SIGPIPE action to SIG_IGN (for example by wrapping
> git with an uncautious Python wrapper that uses subprocess.Popen), the
> change to EPIPE handling is not a behavior change.
Yeah, but I don't want to count on always catching SIGPIPE. There's the
inherited signal handler thing, but there's also the fact that we may
end up ignoring SIGPIPE from backend programs like upload-pack and
receive-pack; they check their writes anyway, and we have already run
into issues with getting SIGPIPE when we don't necessarily expect or
care about it.
The nice thing about write_or_die is that it still _exits_ on EPIPE.
It's just that it doesn't print an error (which is really not a big
deal) and exit with a 0 return code. I really wonder if we should just
change the latter. For programs which are creating copious output (e.g.,
"git log"), the return value is not important anyway. For backend
programs, an unexpected EPIPE from something like write_or_die should
probably involve a non-successful return code.
> Arguably it would be more friendly to stay alive to run the
> post-receive and post-update hooks, though, given that a ref update
> has occurred. Maybe transport commands like this one should always
> set the disposition of SIGPIPE to SIG_IGN.
Yeah, I've suggested that in the past. And I do think it's sane, because
if you took a ref update, you almost certainly want to run the
post-receive, even if the client is no longer around (e.g., if it is
going to email out the changeset).
> [...]
> > --- a/builtin/send-pack.c
> > +++ b/builtin/send-pack.c
> > @@ -79,7 +79,7 @@ static void print_helper_status(struct ref *ref)
> > }
> > strbuf_addch(&buf, '\n');
> >
> > - safe_write(1, buf.buf, buf.len);
> > + write_or_die(1, buf.buf, buf.len);
>
> A signal will kill send-pack before write_or_die has a chance to
> intervene so this change is a no-op unless the caller is sadistic
> (as in the [1] case). In the signal(SIGPIPE, SIG_IGN) case, it might
> be a regression, since "git push" should not declare success when its
> connection to receive-pack closes early.
But that isn't going to receive-pack, is it? Send-pack's stdout is
really just going to the user (or wherever). So it would have an effect
more for:
(git push && echo >&2 OK) | grep -m1 foo
which might print "OK" even if we failed. That's quite contrived, but it
is at least a measurable change. And anyway...
> In that case, if triggerable this looks like a bad change: if
> upload-pack has gone missing, the fetch should not be considered a
> success.
> [...]
> Etc. I'm stopping here.
Yeah, there are definitely some bad ones.
> I'm thinking before a patch like this we should make the following
> change:
>
> 1. at startup, set the signal action of SIGPIPE to SIG_DFL, to make
> the behavior a little more predictable.
I'm lukewarm on that, just because we may want to ignore SIGPIPE
ourselves at some point.
> Perhaps the following as well:
>
> 2. in write_or_die(), when encountering EPIPE, set the signal action
> of SIGPIPE to SIG_DFL and raise(SIGPIPE), ensuring the exit status
> reflects the broken pipe. If the parent process is unnecessarily
> noisy about that, that's a bug in the parent process (hopefully
> uncommon).
I like this. My suggestion would be to just exit(1) instead of exit(0).
But really, raising SIGPIPE makes the most sense, because it
communicates to the parent what happened (and the shell will wisely not
print a message, but careful parents like "git fetch" and "git push"
will check it properly and notice that the child did not succeed).
> Or alternatively:
>
> 2b. never set SIGPIPE to SIG_IGN except in short blocks of code that
> do not call write_or_die()
Yuck. :)
> What do you think?
I really like option 2. That exit(0) when we see SIGPIPE bugs me.
Because we _are_ dying due to our write failing, and I think the only
reason to exit(0) was to avoid unnecessary complaints from parents. But
raising SIGPIPE seems like the best of both worlds to me.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 03/10] pkt-line: clean up "gentle" reading function
2013-02-18 10:12 ` Jonathan Nieder
@ 2013-02-18 10:25 ` Jeff King
0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 10:25 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 02:12:09AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > Originally we had a single function for reading packetized
> > data: packet_read_line. Commit 46284dd grew a more "gentle"
> > form that would return an error instead of dying upon
> > reading a truncated input stream. However:
>
> In other words:
Hmph. I had originally written a commit message organized more like
yours, then I changed it to try to be more clear. I guess that didn't
work.
But yes, you get the intent exactly.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 10:15 ` Jonathan Nieder
@ 2013-02-18 10:26 ` Jeff King
0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2013-02-18 10:26 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 02:15:23AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -160,7 +160,8 @@ static int packet_read_internal(int fd, char *buffer, unsigned size, int gently)
> > }
> > len -= 4;
> > if (len >= size)
> > - die("protocol error: bad line length %d", len);
> > + die("protocol error: line too large: (expected %u, got %d)",
> > + size, len);
>
> Makes sense. I think this should say "expected < %u, got %d", since we
> don't actually expect most lines to be 1004 bytes in practice.
Yeah, I had toyed with writing "expected max %u" for the same reason.
I'll tweak it in the re-roll.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/
2013-02-18 10:19 ` Jonathan Nieder
@ 2013-02-18 10:29 ` Jeff King
2013-02-18 11:05 ` Jonathan Nieder
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 10:29 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 02:19:15AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > Originally packets were used just for the line-oriented ref
> > advertisement and negotiation. These days, we also stuff
> > packfiles and sidebands into them, and they do not
> > necessarily represent a line. Drop the "_line" suffix, as it
> > is not informative and makes the function names quite long
> > (especially as we add "_gently" and other variants).
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > Again, this is a taste issue. Can be optional.
>
> In combination with patch 3, this changes the meaning of packet_read()
> without changing its signature, which could make other patches
> cherry-picked on top change behavior in unpredictable ways. :(
>
> So I'd be all for this if the signature changes (for example to put
> the fd at the end or something), but not so if not.
True. Though packet_read has only existed since last June, only had one
callsite (which would now conflict, since I'm touching it in this
series), and has no new calls in origin..origin/pu. So it's relatively
low risk for such a problem. I don't know how careful we want to be.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation
2013-02-18 9:26 ` [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation Jeff King
@ 2013-02-18 10:43 ` Jonathan Nieder
2013-02-18 10:48 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:43 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> The packet_read function reads from a descriptor.
Ah, so this introduces a new analagous helper that reads from
a strbuf, to avoid the copy-from-async-procedure hack?
[...]
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -103,12 +103,26 @@ static int safe_read(int fd, void *buffer, unsigned size, int gently)
> strbuf_add(buf, buffer, n);
> }
>
> -static int safe_read(int fd, void *buffer, unsigned size, int gently)
> +static int get_packet_data(int fd, char **src_buf, size_t *src_size,
> + void *dst, unsigned size, int gently)
> {
> - ssize_t ret = read_in_full(fd, buffer, size);
> - if (ret < 0)
> - die_errno("read error");
> - else if (ret < size) {
> + ssize_t ret;
> +
> + /* Read up to "size" bytes from our source, whatever it is. */
> + if (src_buf) {
> + ret = size < *src_size ? size : *src_size;
> + memcpy(dst, *src_buf, ret);
> + *src_buf += size;
> + *src_size -= size;
> + }
> + else {
Style: git cuddles its "else"s.
assert(src_buf ? fd < 0 : fd >= 0);
if (src_buf) {
...
} else {
...
}
> + ret = read_in_full(fd, dst, size);
> + if (ret < 0)
> + die_errno("read error");
This is noisy about upstream pipe gone missing, which makes sense
since this is transport-related. Maybe that deserves a comment.
[...]
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -138,28 +138,29 @@ static struct discovery* discover_refs(const char *service)
> if (maybe_smart &&
> (5 <= last->len && last->buf[4] == '#') &&
> !strbuf_cmp(&exp, &type)) {
> + char line[1000];
> + int len;
> +
> /*
> * smart HTTP response; validate that the service
> * pkt-line matches our request.
> */
> - if (packet_get_line(&buffer, &last->buf, &last->len) <= 0)
> - die("%s has invalid packet header", refs_url);
> - if (buffer.len && buffer.buf[buffer.len - 1] == '\n')
> - strbuf_setlen(&buffer, buffer.len - 1);
> + len = packet_read_from_buf(line, sizeof(line), &last->buf, &last->len);
> + if (len && line[len - 1] == '\n')
> + len--;
Was anything guaranteeing that buffer.len < 1000 before this change?
The rest looks good from a quick glance.
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 08/10] remote-curl: pass buffer straight to get_remote_heads
2013-02-18 9:29 ` [PATCHv2 08/10] remote-curl: pass buffer straight to get_remote_heads Jeff King
@ 2013-02-18 10:47 ` Jonathan Nieder
0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:47 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> I don't know that this code was hurting anything, but it has always
> struck me as ugly and a possible source of error. And now it's gone.
Heh. Belongs in the commit message, presumably.
I don't think the async procedure was very harmful, but it's nice to
avoid the cost of a new thread and some copying.
> remote-curl.c | 26 ++------------------------
> 1 file changed, 2 insertions(+), 24 deletions(-)
Nice. The patch looks sane.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation
2013-02-18 10:43 ` Jonathan Nieder
@ 2013-02-18 10:48 ` Jeff King
2013-02-18 10:54 ` Jonathan Nieder
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 10:48 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 02:43:50AM -0800, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > The packet_read function reads from a descriptor.
>
> Ah, so this introduces a new analagous helper that reads from
> a strbuf, to avoid the copy-from-async-procedure hack?
Not from a strbuf, but basically, yes.
> > + ret = read_in_full(fd, dst, size);
> > + if (ret < 0)
> > + die_errno("read error");
>
> This is noisy about upstream pipe gone missing, which makes sense
> since this is transport-related. Maybe that deserves a comment.
That is not new code; it is just reindented from the original safe_read.
But is it noisy about a missing pipe? We do not get EPIPE for reading.
We should just get a short read or EOF, both of which is handled later.
> > + len = packet_read_from_buf(line, sizeof(line), &last->buf, &last->len);
> > + if (len && line[len - 1] == '\n')
> > + len--;
>
> Was anything guaranteeing that buffer.len < 1000 before this change?
No. That's discussed in point (3) of the "implications" in the commit
message.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 10/10] remote-curl: always parse incoming refs
2013-02-18 9:30 ` [PATCHv2 10/10] remote-curl: always parse incoming refs Jeff King
@ 2013-02-18 10:50 ` Jonathan Nieder
2013-02-20 7:41 ` Shawn Pearce
0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:50 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> remote-curl.c | 23 +++++++++++++----------
> 1 file changed, 13 insertions(+), 10 deletions(-)
I like.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation
2013-02-18 10:48 ` Jeff King
@ 2013-02-18 10:54 ` Jonathan Nieder
0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 10:54 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> But is it noisy about a missing pipe? We do not get EPIPE for reading.
Ah, that makes more sense.
[...]
>>> + len = packet_read_from_buf(line, sizeof(line), &last->buf, &last->len);
>>> + if (len && line[len - 1] == '\n')
>>> + len--;
>>
>> Was anything guaranteeing that buffer.len < 1000 before this change?
>
> No. That's discussed in point (3) of the "implications" in the commit
> message.
Thanks. Sorry I missed it the first time.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/
2013-02-18 10:29 ` Jeff King
@ 2013-02-18 11:05 ` Jonathan Nieder
0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Nieder @ 2013-02-18 11:05 UTC (permalink / raw)
To: Jeff King; +Cc: git, Shawn O. Pearce
Jeff King wrote:
> On Mon, Feb 18, 2013 at 02:19:15AM -0800, Jonathan Nieder wrote:
>> In combination with patch 3, this changes the meaning of packet_read()
>> without changing its signature, which could make other patches
>> cherry-picked on top change behavior in unpredictable ways. :(
>>
>> So I'd be all for this if the signature changes (for example to put
>> the fd at the end or something), but not so if not.
>
> True. Though packet_read has only existed since last June, only had one
> callsite (which would now conflict, since I'm touching it in this
> series), and has no new calls in origin..origin/pu. So it's relatively
> low risk for such a problem. I don't know how careful we want to be.
I was unclear. What I am worried about is that someone using a
version of git without this patch will try some yet-to-be-written
patch using packet_read from the mailing list and not notice that they
are using the wrong function. For example, if someone is using
1.7.12.y or 1.8.1.y and wants to try a patch from after the above,
they would get subtly different and wrong results.
The rule "change the name or signature when breaking the ABI of a
global function" is easy to remember and follow. I think we want not
to have to be careful at all, and such rules can help with that. :)
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 9:49 ` Jeff King
@ 2013-02-18 21:25 ` Junio C Hamano
2013-02-18 21:33 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2013-02-18 21:25 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, Shawn O. Pearce
Jeff King <peff@peff.net> writes:
> But it's easy to do (1), and it starts the clock ticking for
> the 1000-byte readers to become obsolete.
Yup, I agree with that goal.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 21:25 ` Junio C Hamano
@ 2013-02-18 21:33 ` Jeff King
2013-02-20 8:47 ` Jeff King
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-18 21:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 01:25:35PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > But it's easy to do (1), and it starts the clock ticking for
> > the 1000-byte readers to become obsolete.
>
> Yup, I agree with that goal.
Having just looked at the pkt-line callers a lot, I think most of them
could go for something like:
char *packet_read(int fd, unsigned *len_p)
{
static char buffer[LARGE_PACKET_MAX];
int len;
len = packet_read_to_buf(fd, buffer, sizeof(buffer));
if (len < 0)
return NULL;
*len_p = len;
return buffer;
}
That would actually simplify the callers a bit, and would harmonize the
buffer sizes at the same time. I'll look into doing a series tonight.
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
` (10 preceding siblings ...)
2013-02-18 9:30 ` [PATCHv2 10/10] remote-curl: always parse incoming refs Jeff King
@ 2013-02-20 7:05 ` Shawn Pearce
11 siblings, 0 replies; 50+ messages in thread
From: Shawn Pearce @ 2013-02-20 7:05 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git
On Mon, Feb 18, 2013 at 1:12 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Feb 17, 2013 at 05:41:13PM -0800, Jonathan Nieder wrote:
>
>> > I don't think so. Don't ERR lines appear inside their own packets?
>>
>> Yes, I misread get_remote_heads for some reason. Thanks for checking.
>
> Thanks for bringing it up. I had not even thought about ERR at all. So
> it was luck rather than skill that I was right. :)
>
>> I'm not sure whether servers are expected to send a flush after an
>> ERR packet. The only codepath I know of in git itself that sends
>> such packets is git-daemon, which does not flush after the error (but
>> is not used in the stateless-rpc case). http-backend uses HTTP error
>> codes for its errors.
>
> I just checked, and GitHub also does not send flush packets after ERR.
> Which makes sense; ERR is supposed to end the conversation. I can change
> GitHub, of course, but who knows what other implementations exist (e.g.,
> I do not know off-hand whether gitolite has custom ERR responses). So it
> seems pretty clear that just checking for a flush packet is not the
> right thing, and we need to actually parse the packet contents (at least
> to some degree).
JGit (and by extension Gerrit Code Review, android.googlesource.com)
sends ERR with no flush-pkt. I would like to sort of keep the protocol
this way, given how many servers in the wild are running Gerrit and
currently use ERR with no flush-pkt. IMHO its a little late to be
closing that door and stuffing a flush-pkt after the ERR that ends the
conversation.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server
2013-02-18 9:49 ` Junio C Hamano
2013-02-18 9:55 ` Jeff King
@ 2013-02-20 7:14 ` Shawn Pearce
1 sibling, 0 replies; 50+ messages in thread
From: Shawn Pearce @ 2013-02-20 7:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Jonathan Nieder, git
On Mon, Feb 18, 2013 at 1:49 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> On Mon, Feb 18, 2013 at 01:29:16AM -0800, Junio C Hamano wrote:
>>
>>> > I just checked, and GitHub also does not send flush packets after ERR.
>>> > Which makes sense; ERR is supposed to end the conversation.
>>>
>>> Hmph. A flush packet was supposed to be a mark to say "all the
>>> packets before this one can be buffered and kept without getting
>>> passed to write(2), but this and all such buffered data _must_ go on
>>> the wire _now_". So in the sense, ERR not followed by a flush may
>>> not even have a chance to be seen on the other end, no? That is
>>> what the comment before the implementation of packet_flush() is all
>>> about.
>>
>> Despite the name, I always thought of packet_flush as more of a signal
>> for the _receiver_, who then knows that they have gotten all of a
>> particular list. In other words, we seem to use it as a sequence point
>> as much as anything (mostly because we immediately write() out any other
>> packets anyway, so there is no flushing to be done; but perhaps there
>> were originally thoughts that we could do more buffering on the writing
>> side).
>
> Exactly.
This is also my understanding of the flush-pkt ("0000"). Its an
end-of-list/end-of-section marker to let the peer know the protocol is
moving to the next state. Except where the protocol can move to the
next state without a flush-pkt of course (see below).
> The logic behind the comment "we do not buffer, but we should" is
> that flush tells the receiver that the sending end is "done" feeding
> it a class of data, and the data before flush do not have to reach
> the receiver immediately, hence we can afford to buffer on the
> sending end if we can measure that doing so would give us better
> performance. We haven't measure and turned buffering on yet.
So funny story. JGit actually buffers the pkt-line writes in memory
and does flushes to the network socket when any of the following
happen:
- fixed size in-memory buffer is full (8k or 32k by default)
- flush-pkt is needed in the protocol
- JGit forces a flush without a flush-pkt
There are places in the protocol where data needs to be shared with
the peer *despite* not having a flush-pkt present in the data stream
to do that. ERR is one of these places. "done\n" at the end of the
negotiation in a client is another. Sending ACK/NAK in a multi_ack
from upload-pack is another. I may be missing more.
JGit had to define three methods to make the pkt-line protocol work correctly:
- writeString: format a string as a single pkt-line, insert into buffer.
- end: write "000" into the buffer, flush the buffer.
- flush: flush the buffer if it has any content.
I always found that comment in front of that function funny. Its
totally not true. Fixing it is harder than just stuffing a buffer in
there and hoping for the best. The callers need work. At which point
that function isn't really what its author was trying to do.
> But when dying, we may of course have data before flushing. We may
> disconnect (by dying) without showing flush (or preceding ERR) to
> the other side, and for that reason, not relying on the flush packet
> on the receiving end is a good change. Once we turn buffering on, we
> probably need to be careful when sending an ERR indicator by making
> it always drain any buffered data and show the ERR indicator without
> buffering, or something.
Yes.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 10/10] remote-curl: always parse incoming refs
2013-02-18 10:50 ` Jonathan Nieder
@ 2013-02-20 7:41 ` Shawn Pearce
0 siblings, 0 replies; 50+ messages in thread
From: Shawn Pearce @ 2013-02-20 7:41 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Jeff King, git
On Mon, Feb 18, 2013 at 1:30 AM, Jeff King <peff@peff.net> wrote:
> When remote-curl receives a list of refs from a server, it
> keeps the whole buffer intact. When we get a "list" command,
> we feed the result to get_remote_heads, and when we get a
> "fetch" or "push" command, we feed it to fetch-pack or
> send-pack, respectively.
>
> If the HTTP response from the server is truncated for any
> reason,
...
> As a result, fetch-pack hangs, waiting for input. However,
> remote-curl believes it has sent all of the advertisement,
> and therefore waits for fetch-pack to speak. The two
> processes end up in a deadlock.
Eek. Thanks for fixing this.
On Mon, Feb 18, 2013 at 2:50 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Jeff King wrote:
>
>> remote-curl.c | 23 +++++++++++++----------
>> 1 file changed, 13 insertions(+), 10 deletions(-)
>
> I like.
Me too.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-18 21:33 ` Jeff King
@ 2013-02-20 8:47 ` Jeff King
2013-02-20 8:54 ` Junio C Hamano
0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2013-02-20 8:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, Shawn O. Pearce
On Mon, Feb 18, 2013 at 04:33:31PM -0500, Jeff King wrote:
> On Mon, Feb 18, 2013 at 01:25:35PM -0800, Junio C Hamano wrote:
>
> > Jeff King <peff@peff.net> writes:
> >
> > > But it's easy to do (1), and it starts the clock ticking for
> > > the 1000-byte readers to become obsolete.
> >
> > Yup, I agree with that goal.
>
> Having just looked at the pkt-line callers a lot, I think most of them
> could go for something like:
> [...]
>
> That would actually simplify the callers a bit, and would harmonize the
> buffer sizes at the same time. I'll look into doing a series tonight.
Just a quick update on this series. It ended up taking more nights than
I thought. :) The result looks much better than what I posted before, and
I found several other corner cases and bugs in packet parsing, too.
I'm going to hold off on posting it tonight, as I'm now up to 19
patches, and after several rounds of "rebase -i", I need to give it a
final read-through myself before inflicting it on anyone else. I'll do
that and post it tomorrow.
In the meantime, please hold off on what I've posted so far (that
includes the jk/smart-http-robustify topic).
-Peff
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCHv2 04/10] pkt-line: change error message for oversized packet
2013-02-20 8:47 ` Jeff King
@ 2013-02-20 8:54 ` Junio C Hamano
0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2013-02-20 8:54 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, Shawn O. Pearce
Jeff King <peff@peff.net> wrote:
>
>In the meantime, please hold off on what I've posted so far (that
>includes the jk/smart-http-robustify topic).
Surely. I'm done for the night already. Looking forward to see the reroll tomorrow.
Thanks.
--
Pardon terseness, typo and HTML from a tablet.
^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2013-02-20 8:56 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-16 6:44 [PATCH 0/3] make smart-http more robust against bogus server input Jeff King
2013-02-16 6:46 ` [PATCH 1/3] pkt-line: teach packet_get_line a no-op mode Jeff King
2013-02-17 10:41 ` Jonathan Nieder
2013-02-16 6:47 ` [PATCH 2/3] remote-curl: verify smart-http metadata lines Jeff King
2013-02-17 10:49 ` Jonathan Nieder
2013-02-17 19:14 ` Jeff King
2013-02-18 0:54 ` Jonathan Nieder
2013-02-18 8:59 ` Jeff King
2013-02-16 6:49 ` [PATCH 3/3] remote-curl: sanity check ref advertisement from server Jeff King
2013-02-17 11:05 ` Jonathan Nieder
2013-02-17 19:28 ` Jeff King
2013-02-18 1:41 ` Jonathan Nieder
2013-02-18 9:12 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Jeff King
2013-02-18 9:14 ` [PATCHv2 01/10] pkt-line: move a misplaced comment Jeff King
2013-02-18 9:15 ` [PATCHv2 02/10] pkt-line: drop safe_write function Jeff King
2013-02-18 9:56 ` Jonathan Nieder
2013-02-18 10:24 ` Jeff King
2013-02-18 9:16 ` [PATCHv2 03/10] pkt-line: clean up "gentle" reading function Jeff King
2013-02-18 10:12 ` Jonathan Nieder
2013-02-18 10:25 ` Jeff King
2013-02-18 9:22 ` [PATCHv2 04/10] pkt-line: change error message for oversized packet Jeff King
2013-02-18 9:40 ` Junio C Hamano
2013-02-18 9:49 ` Jeff King
2013-02-18 21:25 ` Junio C Hamano
2013-02-18 21:33 ` Jeff King
2013-02-20 8:47 ` Jeff King
2013-02-20 8:54 ` Junio C Hamano
2013-02-18 10:15 ` Jonathan Nieder
2013-02-18 10:26 ` Jeff King
2013-02-18 9:22 ` [PATCHv2 05/10] pkt-line: rename s/packet_read_line/packet_read/ Jeff King
2013-02-18 10:19 ` Jonathan Nieder
2013-02-18 10:29 ` Jeff King
2013-02-18 11:05 ` Jonathan Nieder
2013-02-18 9:26 ` [PATCHv2 06/10] pkt-line: share buffer/descriptor reading implementation Jeff King
2013-02-18 10:43 ` Jonathan Nieder
2013-02-18 10:48 ` Jeff King
2013-02-18 10:54 ` Jonathan Nieder
2013-02-18 9:27 ` [PATCHv2 07/10] teach get_remote_heads to read from a memory buffer Jeff King
2013-02-18 9:29 ` [PATCHv2 08/10] remote-curl: pass buffer straight to get_remote_heads Jeff King
2013-02-18 10:47 ` Jonathan Nieder
2013-02-18 9:29 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Junio C Hamano
2013-02-18 9:33 ` Jeff King
2013-02-18 9:49 ` Junio C Hamano
2013-02-18 9:55 ` Jeff King
2013-02-20 7:14 ` Shawn Pearce
2013-02-18 9:29 ` [PATCHv2 09/10] remote-curl: move ref-parsing code up in file Jeff King
2013-02-18 9:30 ` [PATCHv2 10/10] remote-curl: always parse incoming refs Jeff King
2013-02-18 10:50 ` Jonathan Nieder
2013-02-20 7:41 ` Shawn Pearce
2013-02-20 7:05 ` [PATCHv2 0/10] pkt-line and remote-curl cleanups server Shawn Pearce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).