git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready"
@ 2011-03-14 23:48 Shawn O. Pearce
  2011-03-14 23:48 ` [PATCH 2/2] upload-pack: More aggressively send 'ACK %s ready' Shawn O. Pearce
  2011-03-17  7:15 ` [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2011-03-14 23:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

If multi_ack_detailed was selected in the protocol capabilities
(both client and server are >= Git 1.6.6) the upload-pack side will
send "ACK %s ready" when it knows how to safely cut the graph and
produce a reasonable pack for the want list that was already sent
on the connection.

Upon receiving "ACK %s ready" there is no point in looking at
the remaining commits inside of rev_list.  Sending additional
"have %s" lines to the remote will not construct a smaller pack.
It is unlikely a commit older than the current cut point will have
a better delta base than the cut point itself has.

The original design of this code had fetch-pack empty rev_list by
marking a commit and its transitive ancestors COMMON whenever the
remote side said "ACK %s {continue,common}" and skipping over any
already COMMON commits during get_rev().  This approach does not
work when most of rev_list is actually COMMON_REF, commits that
are pointed to by a reference on the remote, which exist locally,
and which have not yet been sent to the remote as a "have %s" line.

Most of the common references are tags in the ref/tags namespace,
using points in the commit graph that are more than 1 commit apart.
In git.git itself, this is currently 340 tags, 339 of which point to
commits in the commit graph.  fetch-pack pushes all of these into
rev_list, but is unable to mark them COMMON and discard during a
remote's "ACK %s {continue,common}" because it does not parse through
the entire parent chain.  Not parsing the entire parent chain is
an optimization to avoid walking back to the roots of the repository.

Assuming the client is only following the remote (and does not make
its own local commits), the client needs 11 rounds to spin through
the entire list of tags (32 commits per round, ceil(339/32) == 11).
Unfortunately the server knows on the first "have %s" line that
it can produce a good pack, and does not need to see the remaining
320 tags in the other 10 rounds.

Over git:// and ssh:// this isn't as bad as it sounds, the client is
only transmitting an extra 16,000 bytes that it doesn't need to send.

Over smart HTTP, the client must do an additional 10 HTTP POST
requests, each of which incurs round-trip latency, and must upload
the entire state vector of all known common objects.  On the final
POST request, this is 16 KiB worth of data.

Fix all of this by clearing rev_list as soon as the remote side
says it can construct a pack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 builtin/fetch-pack.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index b999413..dfacfdf 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -379,6 +379,8 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 					retval = 0;
 					in_vain = 0;
 					got_continue = 1;
+					if (ack == ACK_ready)
+					  rev_list = NULL;
 					break;
 					}
 				}
-- 
1.7.0.9.5.g6bd7.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] upload-pack: More aggressively send 'ACK %s ready'
  2011-03-14 23:48 [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Shawn O. Pearce
@ 2011-03-14 23:48 ` Shawn O. Pearce
  2011-03-17  7:15 ` [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Jeff King
  1 sibling, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2011-03-14 23:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

If a client is merely following the remote (and has not made any
new commits itself), all "have %s" lines sent by the client will be
common to the server.  As all lines are common upload-pack never
calls ok_to_give_up() and does not compute if it has a good cut
point in the commit graph.

Without this computation the following client is going to send all
tagged commits, as these were determined to be COMMON_REF during the
initial advertisement, but the client does not parse their history
to transitively pass the COMMON flag and empty its queue of commits.

For git.git with 339 commit tags, it takes clients 11 rounds of
negotation to fully send all tagged commits and exhaust its queue
of things to send as common.  This is pretty slow for a client that
has not done any local development activity.

Force computing ok_to_give_up() and send "ACK %s ready" at the end
of the current round if this round only contained common objects
and ok_to_give_up() was therefore not called.  This may allow the
client to break early, avoiding transmission of the COMMON_REFs.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 upload-pack.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index b40a43f..2a0f19e 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -429,6 +429,8 @@ static int get_common_commits(void)
 	static char line[1000];
 	unsigned char sha1[20];
 	char last_hex[41];
+	int got_common = 0;
+	int got_other = 0;
 
 	save_commit_buffer = 0;
 
@@ -437,16 +439,22 @@ static int get_common_commits(void)
 		reset_timeout();
 
 		if (!len) {
+			if (multi_ack == 2 && got_common
+					&& !got_other && ok_to_give_up())
+				packet_write(1, "ACK %s ready\n", last_hex);
 			if (have_obj.nr == 0 || multi_ack)
 				packet_write(1, "NAK\n");
 			if (stateless_rpc)
 				exit(0);
+			got_common = 0;
+			got_other = 0;
 			continue;
 		}
 		strip(line, len);
 		if (!prefixcmp(line, "have ")) {
 			switch (got_sha1(line+5, sha1)) {
 			case -1: /* they have what we do not */
+				got_other = 1;
 				if (multi_ack && ok_to_give_up()) {
 					const char *hex = sha1_to_hex(sha1);
 					if (multi_ack == 2)
@@ -456,6 +464,7 @@ static int get_common_commits(void)
 				}
 				break;
 			default:
+				got_common = 1;
 				memcpy(last_hex, sha1_to_hex(sha1), 41);
 				if (multi_ack == 2)
 					packet_write(1, "ACK %s common\n", last_hex);
-- 
1.7.0.9.5.g6bd7.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready"
  2011-03-14 23:48 [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Shawn O. Pearce
  2011-03-14 23:48 ` [PATCH 2/2] upload-pack: More aggressively send 'ACK %s ready' Shawn O. Pearce
@ 2011-03-17  7:15 ` Jeff King
  2011-03-17  7:16   ` Jeff King
  2011-03-17 15:51   ` Shawn Pearce
  1 sibling, 2 replies; 5+ messages in thread
From: Jeff King @ 2011-03-17  7:15 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

On Mon, Mar 14, 2011 at 04:48:38PM -0700, Shawn O. Pearce wrote:

> Upon receiving "ACK %s ready" there is no point in looking at
> the remaining commits inside of rev_list.  Sending additional
> "have %s" lines to the remote will not construct a smaller pack.
> It is unlikely a commit older than the current cut point will have
> a better delta base than the cut point itself has.
> [...]
> Assuming the client is only following the remote (and does not make
> its own local commits), the client needs 11 rounds to spin through
> the entire list of tags (32 commits per round, ceil(339/32) == 11).
> Unfortunately the server knows on the first "have %s" line that
> it can produce a good pack, and does not need to see the remaining
> 320 tags in the other 10 rounds.

Does this optimization help in that case? From looking at the code, it
seems that we offer "ACK %s ready" only in the case that the client
has something we do not. I.e., they _are_ building local commits on top.

> Over smart HTTP, the client must do an additional 10 HTTP POST
> requests, each of which incurs round-trip latency, and must upload
> the entire state vector of all known common objects.  On the final
> POST request, this is 16 KiB worth of data.

This optimization aside, I wonder if it is worth bumping up the number
of haves we send in a chunk from 32 to something higher.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready"
  2011-03-17  7:15 ` [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Jeff King
@ 2011-03-17  7:16   ` Jeff King
  2011-03-17 15:51   ` Shawn Pearce
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff King @ 2011-03-17  7:16 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

On Thu, Mar 17, 2011 at 03:15:12AM -0400, Jeff King wrote:

> > Assuming the client is only following the remote (and does not make
> > its own local commits), the client needs 11 rounds to spin through
> > the entire list of tags (32 commits per round, ceil(339/32) == 11).
> > Unfortunately the server knows on the first "have %s" line that
> > it can produce a good pack, and does not need to see the remaining
> > 320 tags in the other 10 rounds.
> 
> Does this optimization help in that case? From looking at the code, it
> seems that we offer "ACK %s ready" only in the case that the client
> has something we do not. I.e., they _are_ building local commits on top.

OK, never mind, I just read your 2/2. ;)

Sorry for the noise.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready"
  2011-03-17  7:15 ` [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Jeff King
  2011-03-17  7:16   ` Jeff King
@ 2011-03-17 15:51   ` Shawn Pearce
  1 sibling, 0 replies; 5+ messages in thread
From: Shawn Pearce @ 2011-03-17 15:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Thu, Mar 17, 2011 at 00:15, Jeff King <peff@peff.net> wrote:
>> Over smart HTTP, the client must do an additional 10 HTTP POST
>> requests, each of which incurs round-trip latency, and must upload
>> the entire state vector of all known common objects.  On the final
>> POST request, this is 16 KiB worth of data.
>
> This optimization aside, I wonder if it is worth bumping up the number
> of haves we send in a chunk from 32 to something higher.

I have been considering that myself. 32 isn't a good number here. Its
1604 bytes per round (32 have lines, and flush-pkt). Ethernet's
default MTU is 1500 bytes, so stopping at 32 causes us to need more
than one packet, and the last one isn't full. In native git:// or
ssh:// where its bi-directional the client does "race ahead" and send
another 1604 bytes, but now we're at 3208 bytes which still doesn't
fit well within the MTU. :-\

I've thought about increasing this to 64. On git:// or ssh:// that can
be wasteful as the remote should be able to stop us earlier,
especially if we are common very early (e.g. infrequent contributor
who is only 1 commit ahead). The same is true for smart HTTP, boosting
it to 64 would help the maintainer pull from a lieutenant with fewer
rounds, but it hurts the infrequent contributor as his only round is
larger for no good reason.

The better approach might be to automatically double the round size on
each successive round, until we reach an upper limit of say 1024. For
the infrequent contributor we might even consider cutting the initial
round to 16, as it would allow the entire initial round to fit into a
single Ethernet MTU over git://, and with the no-done capability, the
entire exchange is over in that single packet. :-)

For the maintainer, 16 is way too small, but they will then try 32,
64, 128, 256, 512... and should pick up the common point quickly.
Assuming the maintainer is already 600 commits ahead when he pulls,
we'll find the common point in 6 rounds, vs. the current approach that
requires 19 rounds.


But we should really cap the size at something sane like 1024 to
prevent HTTP POST payloads from being more than 64 KiB. But
practically this is <32 KiB because we gzip the POST body, and there
only content is hex digits and the word "have". It should be deflating
to smaller than 50% of the original size. For various selfish reasons
I wish I could keep this under 8192 bytes for the entire HTTP headers
and POST body, but this is under 64 "have" lines, and that's useless.

I'll try putting together this exponential round size patch today, I
have a few other git things to do today.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-03-17 15:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-14 23:48 [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Shawn O. Pearce
2011-03-14 23:48 ` [PATCH 2/2] upload-pack: More aggressively send 'ACK %s ready' Shawn O. Pearce
2011-03-17  7:15 ` [PATCH 1/2] fetch-pack: Finish negotation if remote replies "ACK %s ready" Jeff King
2011-03-17  7:16   ` Jeff King
2011-03-17 15:51   ` Shawn Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).