Git development
 help / color / mirror / Atom feed
* [PATCH 0/14] resumable network bundles
From: Jeff King @ 2011-11-10  7:43 UTC (permalink / raw)
  To: git

One possible option for resumable clones that has been discussed is
letting the server point the client by http to a static bundle
containing most of history, followed by a fetch from the actual git repo
(which should be much cheaper now that we have all of the bundled
history).  This series implements "step 0" of this plan: just letting
bundles be fetched across the network in the first place.

Shawn raised some issues about using bundles for this (as opposed to
accessing the packfiles themselves); specifically, this raises the I/O
footprint of a repository that has to serve both the bundled version of
the pack and the regular packfile.

So it may be that we don't follow this plan all the way through.
However, even if we don't, fetching bundles over http is still a useful
thing to be able to do. Which makes this first step worth doing either
way.

  [01/14]: t/lib-httpd: check for NO_CURL
  [02/14]: http: turn off curl signals
  [03/14]: http: refactor http_request function
  [04/14]: http: add a public function for arbitrary-callback request
  [05/14]: remote-curl: use http callback for requesting refs
  [06/14]: transport: factor out bundle to ref list conversion
  [07/14]: bundle: add is_bundle_buf helper
  [08/14]: remote-curl: free "discovery" object
  [09/14]: remote-curl: auto-detect bundles when fetching refs
  [10/14]: remote-curl: try base $URL after $URL/info/refs
  [11/14]: progress: allow pure-throughput progress meters
  [12/14]: remote-curl: show progress for bundle downloads
  [13/14]: remote-curl: resume interrupted bundle transfers
  [14/14]: clone: give advice on how to resume a failed clone

-Peff

^ permalink raw reply

* Re: [PATCH 0/14] resumable network bundles
From: Jeff King @ 2011-11-10  7:45 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

On Thu, Nov 10, 2011 at 02:43:30AM -0500, Jeff King wrote:

>   [01/14]: t/lib-httpd: check for NO_CURL
>   [02/14]: http: turn off curl signals
>   [03/14]: http: refactor http_request function
>   [04/14]: http: add a public function for arbitrary-callback request
>   [05/14]: remote-curl: use http callback for requesting refs
>   [06/14]: transport: factor out bundle to ref list conversion
>   [07/14]: bundle: add is_bundle_buf helper
>   [08/14]: remote-curl: free "discovery" object
>   [09/14]: remote-curl: auto-detect bundles when fetching refs
>   [10/14]: remote-curl: try base $URL after $URL/info/refs
>   [11/14]: progress: allow pure-throughput progress meters
>   [12/14]: remote-curl: show progress for bundle downloads
>   [13/14]: remote-curl: resume interrupted bundle transfers
>   [14/14]: clone: give advice on how to resume a failed clone

I forgot to mention: this goes on top of mf/curl-select-fdset. It's only
in next now, but some of my http cleanups build semantically on the
cleanups in that topic.

-Peff

^ permalink raw reply

* [PATCH 01/14] t/lib-httpd: check for NO_CURL
From: Jeff King @ 2011-11-10  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

We already do this check individually in each of the tests
that includes lib-httpd. Let's factor it out.

There is one test (t5540) that uses lib-httpd but does not
currently do this check. But it actually has a stricter
check which is a superset (it needs all of the requirements
to have built git-http-push, one of which is not setting
NO_CURL), so adding this extra check won't hurt anything.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh          |    5 +++++
 t/t5541-http-push.sh    |    5 -----
 t/t5550-http-fetch.sh   |    5 -----
 t/t5551-http-fetch.sh   |    5 -----
 t/t5561-http-backend.sh |    5 -----
 5 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index f7dc078..8331527 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -3,6 +3,11 @@
 # Copyright (c) 2008 Clemens Buchacher <drizzd@aon.at>
 #
 
+if test -n "$NO_CURL"; then
+	skip_all='skipping test, git built without http support'
+	test_done
+fi
+
 if test -z "$GIT_TEST_HTTPD"
 then
 	skip_all="Network testing disabled (define GIT_TEST_HTTPD to enable)"
diff --git a/t/t5541-http-push.sh b/t/t5541-http-push.sh
index a73c826..a326ee0 100755
--- a/t/t5541-http-push.sh
+++ b/t/t5541-http-push.sh
@@ -6,11 +6,6 @@
 test_description='test smart pushing over http via http-backend'
 . ./test-lib.sh
 
-if test -n "$NO_CURL"; then
-	skip_all='skipping test, git built without http support'
-	test_done
-fi
-
 ROOT_PATH="$PWD"
 LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5541'}
 . "$TEST_DIRECTORY"/lib-httpd.sh
diff --git a/t/t5550-http-fetch.sh b/t/t5550-http-fetch.sh
index 311a33c..e9282c5 100755
--- a/t/t5550-http-fetch.sh
+++ b/t/t5550-http-fetch.sh
@@ -3,11 +3,6 @@
 test_description='test dumb fetching over http via static file'
 . ./test-lib.sh
 
-if test -n "$NO_CURL"; then
-	skip_all='skipping test, git built without http support'
-	test_done
-fi
-
 LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5550'}
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
diff --git a/t/t5551-http-fetch.sh b/t/t5551-http-fetch.sh
index 26d3557..3557f2e 100755
--- a/t/t5551-http-fetch.sh
+++ b/t/t5551-http-fetch.sh
@@ -3,11 +3,6 @@
 test_description='test smart fetching over http via http-backend'
 . ./test-lib.sh
 
-if test -n "$NO_CURL"; then
-	skip_all='skipping test, git built without http support'
-	test_done
-fi
-
 LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5551'}
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
diff --git a/t/t5561-http-backend.sh b/t/t5561-http-backend.sh
index b5d7fbc..974be7c 100755
--- a/t/t5561-http-backend.sh
+++ b/t/t5561-http-backend.sh
@@ -3,11 +3,6 @@
 test_description='test git-http-backend'
 . ./test-lib.sh
 
-if test -n "$NO_CURL"; then
-	skip_all='skipping test, git built without http support'
-	test_done
-fi
-
 LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5561'}
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 02/14] http: turn off curl signals
From: Jeff King @ 2011-11-10  7:48 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

Curl sets and clears the handler for SIGALRM, which makes it
incompatible with git's progress code. However, we can ask
curl not to do this.

Signed-off-by: Jeff King <peff@peff.net>
---
I'm a little iffy on this one. If I understand correctly, depending on
the build and configuration, curl may not be able to timeout during DNS
lookups. But I'm not sure if it does, anyway, since we don't set any
timeouts.

An alternate plan would be to give the progress code a mode where it
gets poked by curl every second or so (curl has a PROGRESSFUNCTION
option for doing this).

 http.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/http.c b/http.c
index 95c2dfd..4f9e004 100644
--- a/http.c
+++ b/http.c
@@ -318,6 +318,8 @@ static int has_cert_password(void)
 	if (curl_http_proxy)
 		curl_easy_setopt(result, CURLOPT_PROXY, curl_http_proxy);
 
+	curl_easy_setopt(result, CURLOPT_NOSIGNAL, 1);
+
 	return result;
 }
 
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 03/14] http: refactor http_request function
From: Jeff King @ 2011-11-10  7:48 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

This function takes a flag to indicate where the output
should go (either to a file or to a strbuf). This flag is
mostly used to set the callback function we hand to curl.

This isn't very flexible for adding new output types.
Instead, let's just let callers pass in the callback
function directly. This results in shorter, more readable,
and more flexible code.

The only other thing the flag was used for was to set a
"Range" header when we already have a partial file (by using
the results of ftell). This patch also adds an "offset"
parameter, which can be used by callers to specify this
feature separately (which is also more flexible, as non-FILE
callers can now resume partial transfers).

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c |   37 ++++++++++++++-----------------------
 1 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/http.c b/http.c
index 4f9e004..9ffd894 100644
--- a/http.c
+++ b/http.c
@@ -797,11 +797,8 @@ void append_remote_object_url(struct strbuf *buf, const char *url,
 	return strbuf_detach(&buf, NULL);
 }
 
-/* http_request() targets */
-#define HTTP_REQUEST_STRBUF	0
-#define HTTP_REQUEST_FILE	1
-
-static int http_request(const char *url, void *result, int target, int options)
+static int http_request(const char *url, curl_write_callback cb, void *result,
+			long offset, int options)
 {
 	struct active_request_slot *slot;
 	struct slot_results results;
@@ -818,19 +815,13 @@ static int http_request(const char *url, void *result, int target, int options)
 	} else {
 		curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
 		curl_easy_setopt(slot->curl, CURLOPT_FILE, result);
+		curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, cb);
+	}
 
-		if (target == HTTP_REQUEST_FILE) {
-			long posn = ftell(result);
-			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
-					 fwrite);
-			if (posn > 0) {
-				strbuf_addf(&buf, "Range: bytes=%ld-", posn);
-				headers = curl_slist_append(headers, buf.buf);
-				strbuf_reset(&buf);
-			}
-		} else
-			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
-					 fwrite_buffer);
+	if (offset > 0) {
+		strbuf_addf(&buf, "Range: bytes=%lu-", offset);
+		headers = curl_slist_append(headers, buf.buf);
+		strbuf_reset(&buf);
 	}
 
 	strbuf_addstr(&buf, "Pragma:");
@@ -881,18 +872,18 @@ static int http_request(const char *url, void *result, int target, int options)
 	return ret;
 }
 
-static int http_request_reauth(const char *url, void *result, int target,
-			       int options)
+static int http_request_reauth(const char *url, curl_write_callback cb,
+			       void *result, unsigned long offset, int options)
 {
-	int ret = http_request(url, result, target, options);
+	int ret = http_request(url, cb, result, offset, options);
 	if (ret != HTTP_REAUTH)
 		return ret;
-	return http_request(url, result, target, options);
+	return http_request(url, cb, result, offset, options);
 }
 
 int http_get_strbuf(const char *url, struct strbuf *result, int options)
 {
-	return http_request_reauth(url, result, HTTP_REQUEST_STRBUF, options);
+	return http_request_reauth(url, fwrite_buffer, result, 0, options);
 }
 
 /*
@@ -915,7 +906,7 @@ static int http_get_file(const char *url, const char *filename, int options)
 		goto cleanup;
 	}
 
-	ret = http_request_reauth(url, result, HTTP_REQUEST_FILE, options);
+	ret = http_request_reauth(url, NULL, result, ftell(result), options);
 	fclose(result);
 
 	if ((ret == HTTP_OK) && move_temp_to_file(tmpfile.buf, filename))
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* Re: RFH: unexpected reflog behavior with --since=
From: Eric Raible @ 2011-11-10  7:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org
In-Reply-To: <20111109220128.GA31535@sigill.intra.peff.net>

On 11/9/2011 2:01 PM, Jeff King wrote:
> On Tue, Nov 08, 2011 at 04:22:41PM -0800, Eric Raible wrote:
> 
> [explanation how --since is used to limits traversal omitted]

Yes, all that is as expected, and makes sense.

> Now let's look at reflog walking. It's kind of bolted on to the side
> of the revision traversal machinery. We walk through the reflog
> backwards and pretend that entry N's parent is entry N-1 (you can see
> this if you do "git log -g -p", for example; you see the patch versus
> the last reflog entry, not the patch against the commit's true parent).
> 
> In the case of rewound history (like the reset you showed above), this
> means that the history graph will appear to have bad clock skew. The
> timestamp of HEAD@{0} is going to be much earlier than its pretend
> parent, HEAD@{1}. And the "--since" optimization is going to cut off
> traversal, even though there are more interesting commits to be shown.
> 
> So in that sense, I think it's a bug, and we should probably disable the
> exit-early-from-traversal optimization when we're walking reflogs.

Indeed.  Seems like a case of an optimization leading to an incorrect result.

> But it may also be a misfeature, because it's not clear what you're
> actually trying to limit by. We have commit timestamps, of course, but
> when we are walking reflogs, we also have reflog timestamps. Did you
> actually want to say "show me all commits in the reflog, in reverse
> reflog order, omitting commits that happened before time t"? Or did you
> really mean "show me the reflog entries that happened before time t,
> regardless of their commit timestamp"?

I meant "show me the reflog entries that happened *since* time t,
regardless of their commit timestamp.

> In the latter case, we would either need a new specifier (like
> "--reflog-since"), or to rewrite the commit timestamp when we rewrite
> the parent pointers.
> 
> The latter has a certain elegance to it (we are making a pretend linear
> history graph out of the reflog, so faking the timestamps to be sensible
> and in order is a logical thing to do) but I worry about lying too much
> in the output. Something like "git log -g --format=%cd" would now have
> the fake timestamp in the output. But then, we already show the fake
> parents in the output, so I don't know that this is any worse.

Since -g is asking specifying for the reflog, and since the reflog has
its own timestamps, I would expect that those timestamps be used.

^ permalink raw reply

* [PATCH 04/14] http: add a public function for arbitrary-callback request
From: Jeff King @ 2011-11-10  7:49 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

The http_request function recently learned to take arbitrary
callbacks; let's expose this functionality to callers.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c |    6 ++++++
 http.h |    3 +++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/http.c b/http.c
index 9ffd894..91451e9 100644
--- a/http.c
+++ b/http.c
@@ -886,6 +886,12 @@ int http_get_strbuf(const char *url, struct strbuf *result, int options)
 	return http_request_reauth(url, fwrite_buffer, result, 0, options);
 }
 
+int http_get_callback(const char *url, curl_write_callback cb,
+		      void *data, long offset, int options)
+{
+	return http_request_reauth(url, cb, data, offset, options);
+}
+
 /*
  * Downloads an url and stores the result in the given file.
  *
diff --git a/http.h b/http.h
index ee16069..4977bde 100644
--- a/http.h
+++ b/http.h
@@ -132,6 +132,9 @@ extern void append_remote_object_url(struct strbuf *buf, const char *url,
  */
 int http_get_strbuf(const char *url, struct strbuf *result, int options);
 
+int http_get_callback(const char *url, curl_write_callback cb, void *data,
+		      long offset, int options);
+
 /*
  * Prints an error message using error() containing url and curl_errorstr,
  * and returns ret.
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 05/14] remote-curl: use http callback for requesting refs
From: Jeff King @ 2011-11-10  7:49 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

This should behave identically to the current strbuf code,
but opens up room for us to do more clever things with
bundles in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
Obviously it's way more code for the same thing, but future patches will
make the design more clear.

 remote-curl.c |   22 ++++++++++++++++++++--
 1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 0e720ee..fb4d853 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -90,6 +90,24 @@ static void free_discovery(struct discovery *d)
 	}
 }
 
+struct get_refs_cb_data {
+	struct strbuf *out;
+};
+
+static size_t get_refs_callback(char *buf, size_t sz, size_t n, void *vdata)
+{
+	struct get_refs_cb_data *data = vdata;
+	strbuf_add(data->out, buf, sz * n);
+	return sz * n;
+}
+
+static int get_refs_from_url(const char *url, struct strbuf *out, int options)
+{
+	struct get_refs_cb_data data;
+	data.out = out;
+	return http_get_callback(url, get_refs_callback, &data, 0, options);
+}
+
 static struct discovery* discover_refs(const char *service)
 {
 	struct strbuf buffer = STRBUF_INIT;
@@ -112,7 +130,7 @@ static void free_discovery(struct discovery *d)
 	}
 	refs_url = strbuf_detach(&buffer, NULL);
 
-	http_ret = http_get_strbuf(refs_url, &buffer, HTTP_NO_CACHE);
+	http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE);
 
 	/* try again with "plain" url (no ? or & appended) */
 	if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) {
@@ -123,7 +141,7 @@ static void free_discovery(struct discovery *d)
 		strbuf_addf(&buffer, "%sinfo/refs", url);
 		refs_url = strbuf_detach(&buffer, NULL);
 
-		http_ret = http_get_strbuf(refs_url, &buffer, HTTP_NO_CACHE);
+		http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE);
 	}
 
 	switch (http_ret) {
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 06/14] transport: factor out bundle to ref list conversion
From: Jeff King @ 2011-11-10  7:49 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

There is a data structure mismatch between what the
transport code wants (a linked list of "struct ref") and
what the bundle header provides (an array of ref names and
sha1s), so the transport code has to convert.

Let's factor out this conversion to make it useful to other
transport-ish callers (like remote-curl).

Signed-off-by: Jeff King <peff@peff.net>
---
 bundle.c    |   16 ++++++++++++++++
 bundle.h    |    2 ++
 transport.c |   11 +----------
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/bundle.c b/bundle.c
index 08020bc..e48fe2f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -7,6 +7,7 @@
 #include "list-objects.h"
 #include "run-command.h"
 #include "refs.h"
+#include "remote.h"
 
 static const char bundle_signature[] = "# v2 git bundle\n";
 
@@ -449,3 +450,18 @@ int unbundle(struct bundle_header *header, int bundle_fd, int flags)
 		return error("index-pack died");
 	return 0;
 }
+
+struct ref *bundle_header_to_refs(const struct bundle_header *header)
+{
+	struct ref *result = NULL;
+	int i;
+
+	for (i = 0; i < header->references.nr; i++) {
+		struct ref_list_entry *e = header->references.list + i;
+		struct ref *ref = alloc_ref(e->name);
+		hashcpy(ref->old_sha1, e->sha1);
+		ref->next = result;
+		result = ref;
+	}
+	return result;
+}
diff --git a/bundle.h b/bundle.h
index 1584e4d..675cc97 100644
--- a/bundle.h
+++ b/bundle.h
@@ -24,4 +24,6 @@ int create_bundle(struct bundle_header *header, const char *path,
 int list_bundle_refs(struct bundle_header *header,
 		int argc, const char **argv);
 
+struct ref *bundle_header_to_refs(const struct bundle_header *header);
+
 #endif
diff --git a/transport.c b/transport.c
index 51814b5..5020bbb 100644
--- a/transport.c
+++ b/transport.c
@@ -407,8 +407,6 @@ struct bundle_transport_data {
 static struct ref *get_refs_from_bundle(struct transport *transport, int for_push)
 {
 	struct bundle_transport_data *data = transport->data;
-	struct ref *result = NULL;
-	int i;
 
 	if (for_push)
 		return NULL;
@@ -418,14 +416,7 @@ struct bundle_transport_data {
 	data->fd = read_bundle_header(transport->url, &data->header);
 	if (data->fd < 0)
 		die ("Could not read bundle '%s'.", transport->url);
-	for (i = 0; i < data->header.references.nr; i++) {
-		struct ref_list_entry *e = data->header.references.list + i;
-		struct ref *ref = alloc_ref(e->name);
-		hashcpy(ref->old_sha1, e->sha1);
-		ref->next = result;
-		result = ref;
-	}
-	return result;
+	return bundle_header_to_refs(&data->header);
 }
 
 static int fetch_refs_from_bundle(struct transport *transport,
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 07/14] bundle: add is_bundle_buf helper
From: Jeff King @ 2011-11-10  7:50 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

This is similar to is_bundle, but checks an in-memory buffer
rather than a file. It also works on a partial buffer,
checking only the bundle signature. This means the result is
a three-way conditional: yes, no, or "we do not have enough
data yet".

Signed-off-by: Jeff King <peff@peff.net>
---
 bundle.c |   14 ++++++++++++++
 bundle.h |    1 +
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/bundle.c b/bundle.c
index e48fe2f..6f911bc 100644
--- a/bundle.c
+++ b/bundle.c
@@ -122,6 +122,20 @@ int is_bundle(const char *path, int quiet)
 	return (fd >= 0);
 }
 
+int is_bundle_buf(const char *s, int len)
+{
+	if (len > strlen(bundle_signature))
+		len = strlen(bundle_signature);
+	/* If we don't match what we already have, then definitely not. */
+	if (memcmp(s, bundle_signature, len))
+		return 0;
+	/* If we have enough bytes, we can say yes */
+	if (len == strlen(bundle_signature))
+		return 1;
+	/* otherwise, we can only say "maybe" */
+	return -1;
+}
+
 static int list_refs(struct ref_list *r, int argc, const char **argv)
 {
 	int i;
diff --git a/bundle.h b/bundle.h
index 675cc97..8bec44d 100644
--- a/bundle.h
+++ b/bundle.h
@@ -15,6 +15,7 @@ struct bundle_header {
 };
 
 int is_bundle(const char *path, int quiet);
+int is_bundle_buf(const char *s, int len);
 int read_bundle_header(const char *path, struct bundle_header *header);
 int create_bundle(struct bundle_header *header, const char *path,
 		int argc, const char **argv);
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 08/14] remote-curl: free "discovery" object
From: Jeff King @ 2011-11-10  7:50 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

We cache the last-used "discovery" object, which contains
the data we pulled from the remote about which refs it has,
which saves us an HTTP round-trip when somebody does
something like "list" followed by "fetch".

We don't bother free()ing it at the end of the program
because it just contains memory which will be reclaimed by
the OS. However, cleaning up explicitly will future-proof us
against later changes which will add external storage (like
temporary files).

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index fb4d853..014d413 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -933,6 +933,7 @@ int main(int argc, const char **argv)
 		strbuf_reset(&buf);
 	} while (1);
 
+	free_discovery(last_discovery);
 	http_cleanup();
 
 	return 0;
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 09/14] remote-curl: auto-detect bundles when fetching refs
From: Jeff King @ 2011-11-10  7:50 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

You can't currently fetch from a network bundle, like:

  git fetch http://example.com/foo.bundle

This patch takes the first (and biggest) step towards that
working: it auto-detects when fetching refs results in a
bundle, and automatically spools the bundle to disk and
fetches from it.

There are a few important design decisions to note:

  1. We auto-detect the bundle based on content, not based
     on a special token in the URL (like ending in
     ".bundle"). This lets the server side be flexible with
     its URLs (e.g., "http://example.com/bundle?repo=foo").

  2. When fetching refs, we don't actually fetch $URL, but
     start with $URL/info/refs, looking for smart or dumb
     http. Some servers, when file "foo.bundle" exists, will
     serve it to the client when "foo.bundle/info/refs" is
     requested. Therefore we may be "surprised" to receive a
     bundle when we thought we were just getting the list of
     refs, and need to handle it appropriately.

  3. We spool the bundle to disk, and then run "index-pack
     --fix-thin" to create a packfile. That means we will
     momentarily use twice the size of the bundle in local
     disk space. Avoiding this would mean piping directly to
     "index-pack --fix-thin".  However, if we want to be
     able to resume the transfer of the bundle after an
     interruption, then we need to save the bundle's pack.

     In theory a smart index-pack that was interrupted could
     write out its partial results along with a count of how
     many bytes it actually consumed (i.e., where to resume
     next time), and then pick up where it left off when fed
     the rest of the data. But index-pack isn't that smart
     yet, so let's start off with spooling.

No tests yet, as apache is not one of the "surprising"
servers from (2), and our test harness is based around that
(though just with this patch, you can fetch from surprising
servers like lighttpd).

Signed-off-by: Jeff King <peff@peff.net>
---
This is really the big, interesting one.

 remote-curl.c |  124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 119 insertions(+), 5 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 014d413..84586e0 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -7,6 +7,7 @@
 #include "run-command.h"
 #include "pkt-line.h"
 #include "sideband.h"
+#include "bundle.h"
 
 static struct remote *remote;
 static const char *url; /* always ends with a trailing slash */
@@ -77,6 +78,10 @@ struct discovery {
 	char *buf;
 	size_t len;
 	unsigned proto_git : 1;
+
+	char *bundle_filename;
+	int bundle_fd;
+	struct bundle_header bundle_header;
 };
 static struct discovery *last_discovery;
 
@@ -86,26 +91,93 @@ static void free_discovery(struct discovery *d)
 		if (d == last_discovery)
 			last_discovery = NULL;
 		free(d->buf_alloc);
+		if (d->bundle_fd >= 0)
+			close(d->bundle_fd);
+		if (d->bundle_filename) {
+			unlink(d->bundle_filename);
+			free(d->bundle_filename);
+		}
 		free(d);
 	}
 }
 
 struct get_refs_cb_data {
 	struct strbuf *out;
+
+	int is_bundle;
+	const char *tmpname;
+	FILE *fh;
 };
 
 static size_t get_refs_callback(char *buf, size_t sz, size_t n, void *vdata)
 {
 	struct get_refs_cb_data *data = vdata;
-	strbuf_add(data->out, buf, sz * n);
+	struct strbuf *out = data->out;
+
+	if (data->is_bundle > 0)
+		return fwrite(buf, sz, n, data->fh);
+
+	strbuf_add(out, buf, sz * n);
+
+	if (data->is_bundle == 0)
+		return sz * n;
+
+	data->is_bundle = is_bundle_buf(out->buf, out->len);
+	if (data->is_bundle > 0) {
+		data->fh = fopen(data->tmpname, "wb");
+		if (!data->fh)
+			die_errno("unable to open %s", data->tmpname);
+		if (fwrite(out->buf, 1, out->len, data->fh) < out->len)
+			die_errno("unable to write to %s", data->tmpname);
+	}
 	return sz * n;
 }
 
-static int get_refs_from_url(const char *url, struct strbuf *out, int options)
+static int get_refs_from_url(const char *url, struct strbuf *out, int options,
+			     const char *tmpname, int *is_bundle)
 {
 	struct get_refs_cb_data data;
+	int ret;
+
 	data.out = out;
-	return http_get_callback(url, get_refs_callback, &data, 0, options);
+	data.is_bundle = -1;
+	data.tmpname = tmpname;
+	data.fh = NULL;
+
+	ret = http_get_callback(url, get_refs_callback, &data, 0, options);
+
+	if (data.fh) {
+		if (fclose(data.fh))
+			die_errno("unable to write to %s", data.tmpname);
+	}
+
+	*is_bundle = data.is_bundle > 0;
+	return ret;
+}
+
+static const char *url_to_bundle_tmpfile(const char *url)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int last_was_quoted = 1;
+	const char *ret;
+
+	strbuf_addstr(&buf, "tmp_bundle_");
+	for (; *url; url++) {
+		if (isalpha(*url) || isdigit(*url)) {
+			strbuf_addch(&buf, *url);
+			last_was_quoted = 0;
+		}
+		else if (!last_was_quoted) {
+			strbuf_addch(&buf, '_');
+			last_was_quoted = 1;
+		}
+	}
+	if (last_was_quoted)
+		strbuf_setlen(&buf, buf.len - 1);
+
+	ret = git_path("objects/%s", buf.buf);
+	strbuf_release(&buf);
+	return ret;
 }
 
 static struct discovery* discover_refs(const char *service)
@@ -114,11 +186,15 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options)
 	struct discovery *last = last_discovery;
 	char *refs_url;
 	int http_ret, is_http = 0, proto_git_candidate = 1;
+	const char *filename;
+	int is_bundle;
 
 	if (last && !strcmp(service, last->service))
 		return last;
 	free_discovery(last);
 
+	filename = url_to_bundle_tmpfile(url);
+
 	strbuf_addf(&buffer, "%sinfo/refs", url);
 	if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) {
 		is_http = 1;
@@ -130,7 +206,8 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options)
 	}
 	refs_url = strbuf_detach(&buffer, NULL);
 
-	http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE);
+	http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE,
+				     filename, &is_bundle);
 
 	/* try again with "plain" url (no ? or & appended) */
 	if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) {
@@ -141,7 +218,8 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options)
 		strbuf_addf(&buffer, "%sinfo/refs", url);
 		refs_url = strbuf_detach(&buffer, NULL);
 
-		http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE);
+		http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE,
+					     filename, &is_bundle);
 	}
 
 	switch (http_ret) {
@@ -161,6 +239,7 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options)
 	last->service = service;
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
+	last->bundle_fd = -1;
 
 	if (is_http && proto_git_candidate
 		&& 5 <= last->len && last->buf[4] == '#') {
@@ -190,6 +269,10 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options)
 		last->proto_git = 1;
 	}
 
+	else if (is_bundle) {
+		last->bundle_filename = xstrdup(filename);
+	}
+
 	free(refs_url);
 	strbuf_release(&buffer);
 	last_discovery = last;
@@ -276,6 +359,22 @@ static int write_discovery(int in, int out, void *data)
 	return refs;
 }
 
+static void ensure_bundle_open(struct discovery *heads)
+{
+	if (heads->bundle_fd >= 0)
+		return;
+	heads->bundle_fd = read_bundle_header(heads->bundle_filename,
+					      &heads->bundle_header);
+	if (heads->bundle_fd < 0)
+		die("could not read bundle from %s", url);
+}
+
+static struct ref *parse_bundle_refs(struct discovery *heads)
+{
+	ensure_bundle_open(heads);
+	return bundle_header_to_refs(&heads->bundle_header);
+}
+
 static struct ref *get_refs(int for_push)
 {
 	struct discovery *heads;
@@ -287,6 +386,11 @@ static int write_discovery(int in, int out, void *data)
 
 	if (heads->proto_git)
 		return parse_git_refs(heads);
+	if (heads->bundle_filename) {
+		if (for_push)
+			die("cannot push into a remote bundle");
+		return parse_bundle_refs(heads);
+	}
 	return parse_info_refs(heads);
 }
 
@@ -690,11 +794,21 @@ static int fetch_git(struct discovery *heads,
 	return err;
 }
 
+static int fetch_bundle(struct discovery *d,
+			int nr_heads, struct ref **to_fetch)
+{
+	ensure_bundle_open(d);
+	return unbundle(&d->bundle_header, d->bundle_fd,
+			options.progress ? BUNDLE_VERBOSE : 0);
+}
+
 static int fetch(int nr_heads, struct ref **to_fetch)
 {
 	struct discovery *d = discover_refs("git-upload-pack");
 	if (d->proto_git)
 		return fetch_git(d, nr_heads, to_fetch);
+	else if (d->bundle_filename)
+		return fetch_bundle(d, nr_heads, to_fetch);
 	else
 		return fetch_dumb(nr_heads, to_fetch);
 }
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 10/14] remote-curl: try base $URL after $URL/info/refs
From: Jeff King @ 2011-11-10  7:51 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

When fetching via http, we will try "$URL/info/refs" to get
the list of refs. We may get an unexpected bundle from that
transfer, and we already handle that case. But we should
also check just "$URL" to see if it's a bundle.

Signed-off-by: Jeff King <peff@peff.net>
---
And now we can actually test with apache.

 remote-curl.c          |   19 +++++++++++++++++++
 t/t5552-http-bundle.sh |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100755 t/t5552-http-bundle.sh

diff --git a/remote-curl.c b/remote-curl.c
index 84586e0..7734495 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -222,6 +222,25 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options,
 					     filename, &is_bundle);
 	}
 
+	/* try the straight URL for a bundle, but don't impact the
+	 * error reporting that happens below. */
+	if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) {
+		struct strbuf trimmed = STRBUF_INIT;
+		int r;
+
+		strbuf_reset(&buffer);
+
+		strbuf_addstr(&trimmed, url);
+		while (trimmed.len > 0 && trimmed.buf[trimmed.len-1] == '/')
+			strbuf_setlen(&trimmed, trimmed.len - 1);
+
+		r = get_refs_from_url(trimmed.buf, &buffer, 0, filename,
+				      &is_bundle);
+		if (r == HTTP_OK && is_bundle)
+			http_ret = r;
+		strbuf_release(&trimmed);
+	}
+
 	switch (http_ret) {
 	case HTTP_OK:
 		break;
diff --git a/t/t5552-http-bundle.sh b/t/t5552-http-bundle.sh
new file mode 100755
index 0000000..8527403
--- /dev/null
+++ b/t/t5552-http-bundle.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='test fetching from http-accessible bundles'
+. ./test-lib.sh
+
+LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5552'}
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'create bundles' '
+	test_commit one &&
+	git bundle create "$HTTPD_DOCUMENT_ROOT_PATH/one.bundle" --all &&
+	test_commit two &&
+	git bundle create "$HTTPD_DOCUMENT_ROOT_PATH/two.bundle" --all ^one
+'
+
+test_expect_success 'clone from bundle' '
+	git clone --bare $HTTPD_URL/one.bundle clone &&
+	echo one >expect &&
+	git --git-dir=clone log -1 --format=%s >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'fetch from bundle' '
+	git --git-dir=clone fetch $HTTPD_URL/two.bundle refs/*:refs/* &&
+	echo two >expect &&
+	git --git-dir=clone log -1 --format=%s >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'cannot clone from partial bundle' '
+	test_must_fail git clone $HTTPD_URL/two.bundle
+'
+
+stop_httpd
+test_done
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 11/14] progress: allow pure-throughput progress meters
From: Jeff King @ 2011-11-10  7:53 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

The progress code assumes we are counting something (usually
objects), even if we are measuring throughput. This works
for fetching packfiles, since they show us the object count
alongside the throughput, like:

  Receiving objects:   2% (301/11968), 22.00 MiB | 10.97 MiB/s

You can also tell the progress code you don't know how many
items you have (by specifying a total of 0), and it looks
like:

  Counting objects: 34957

However, if you're fetching a single large item, you want
throughput but you might not have a meaningful count. You
can say you are getting item 0 or 1 out of 1 total, but then
the percent meter is misleading:

  Downloading:   0% (0/1), 22.00 MiB | 10.97 MiB/s

or

  Downloading: 100% (0/1), 22.00 MiB | 10.97 MiB/s

Neither of those is accurate. You are probably somewhere
between zero and 100 percent through the operation, but you
don't know how far.

Telling it you don't know how many items is even uglier:

  Downloading: 1, 22.00 MiB | 10.97 MiB/s

Instead, this patch will omit the count entirely if you are
on the zero-th item of an unknown number of items. It looks
like:

  Downloading: 22.00 MiB | 10.97 MiB/s

Signed-off-by: Jeff King <peff@peff.net>
---
This was the last amount of work to massage the progress code into doing
what I wanted. It might be nicer if it could show a percentage (if we
know the total size), but there's even more surgery required for that.

 progress.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/progress.c b/progress.c
index 3971f49..92fc3c5 100644
--- a/progress.c
+++ b/progress.c
@@ -103,7 +103,10 @@ static int display(struct progress *progress, unsigned n, const char *done)
 			return 1;
 		}
 	} else if (progress_update) {
-		fprintf(stderr, "%s: %u%s%s", progress->title, n, tp, eol);
+		fprintf(stderr, "%s: ", progress->title);
+		if (n)
+			fprintf(stderr, "%u", n);
+		fprintf(stderr, "%s%s", tp, eol);
 		fflush(stderr);
 		progress_update = 0;
 		return 1;
@@ -113,23 +116,24 @@ static int display(struct progress *progress, unsigned n, const char *done)
 }
 
 static void throughput_string(struct throughput *tp, off_t total,
-			      unsigned int rate)
+			      unsigned int rate, struct progress *p)
 {
+	const char *delim = p->total || p->last_value > 0 ? ", " : "";
 	int l = sizeof(tp->display);
 	if (total > 1 << 30) {
-		l -= snprintf(tp->display, l, ", %u.%2.2u GiB",
+		l -= snprintf(tp->display, l, "%s%u.%2.2u GiB", delim,
 			      (int)(total >> 30),
 			      (int)(total & ((1 << 30) - 1)) / 10737419);
 	} else if (total > 1 << 20) {
 		int x = total + 5243;  /* for rounding */
-		l -= snprintf(tp->display, l, ", %u.%2.2u MiB",
+		l -= snprintf(tp->display, l, "%s%u.%2.2u MiB", delim,
 			      x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
 	} else if (total > 1 << 10) {
 		int x = total + 5;  /* for rounding */
-		l -= snprintf(tp->display, l, ", %u.%2.2u KiB",
+		l -= snprintf(tp->display, l, "%s%u.%2.2u KiB", delim,
 			      x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
 	} else {
-		l -= snprintf(tp->display, l, ", %u bytes", (int)total);
+		l -= snprintf(tp->display, l, "%s%u bytes", delim, (int)total);
 	}
 
 	if (rate > 1 << 10) {
@@ -197,7 +201,7 @@ void display_throughput(struct progress *progress, off_t total)
 		tp->last_misecs[tp->idx] = misecs;
 		tp->idx = (tp->idx + 1) % TP_IDX_MAX;
 
-		throughput_string(tp, total, rate);
+		throughput_string(tp, total, rate, progress);
 		if (progress->last_value != -1 && progress_update)
 			display(progress, progress->last_value, NULL);
 	}
@@ -255,7 +259,7 @@ void stop_progress_msg(struct progress **p_progress, const char *msg)
 		if (tp) {
 			unsigned int rate = !tp->avg_misecs ? 0 :
 					tp->avg_bytes / tp->avg_misecs;
-			throughput_string(tp, tp->curr_total, rate);
+			throughput_string(tp, tp->curr_total, rate, progress);
 		}
 		progress_update = 1;
 		sprintf(bufp, ", %s.\n", msg);
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 12/14] remote-curl: show progress for bundle downloads
From: Jeff King @ 2011-11-10  7:53 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

Generally, the point of fetching from a bundle is that it's
big. Without a progress meter, git will appear to hang
during the long download.

This patch adds a throughput meter (i.e., just the bytes
transferred and the rate). In the long run, we should look
for a content-length header from the server so we can show a
total size and completion percentage. However, displaying
that properly will require some surgery to the progress
code, so let's leave it as a future enhancement.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 7734495..6b0820e 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -8,6 +8,7 @@
 #include "pkt-line.h"
 #include "sideband.h"
 #include "bundle.h"
+#include "progress.h"
 
 static struct remote *remote;
 static const char *url; /* always ends with a trailing slash */
@@ -107,6 +108,9 @@ struct get_refs_cb_data {
 	int is_bundle;
 	const char *tmpname;
 	FILE *fh;
+
+	struct progress *progress;
+	off_t total;
 };
 
 static size_t get_refs_callback(char *buf, size_t sz, size_t n, void *vdata)
@@ -114,8 +118,11 @@ static size_t get_refs_callback(char *buf, size_t sz, size_t n, void *vdata)
 	struct get_refs_cb_data *data = vdata;
 	struct strbuf *out = data->out;
 
-	if (data->is_bundle > 0)
+	if (data->is_bundle > 0) {
+		data->total += sz * n;
+		display_throughput(data->progress, data->total);
 		return fwrite(buf, sz, n, data->fh);
+	}
 
 	strbuf_add(out, buf, sz * n);
 
@@ -129,6 +136,12 @@ static size_t get_refs_callback(char *buf, size_t sz, size_t n, void *vdata)
 			die_errno("unable to open %s", data->tmpname);
 		if (fwrite(out->buf, 1, out->len, data->fh) < out->len)
 			die_errno("unable to write to %s", data->tmpname);
+		if (options.progress) {
+			data->total = out->len;
+			data->progress = start_progress("Downloading bundle", 0);
+			display_progress(data->progress, 0);
+			display_throughput(data->progress, data->total);
+		}
 	}
 	return sz * n;
 }
@@ -143,6 +156,8 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options,
 	data.is_bundle = -1;
 	data.tmpname = tmpname;
 	data.fh = NULL;
+	data.progress = NULL;
+	data.total = 0;
 
 	ret = http_get_callback(url, get_refs_callback, &data, 0, options);
 
@@ -150,6 +165,7 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options,
 		if (fclose(data.fh))
 			die_errno("unable to write to %s", data.tmpname);
 	}
+	stop_progress(&data.progress);
 
 	*is_bundle = data.is_bundle > 0;
 	return ret;
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 13/14] remote-curl: resume interrupted bundle transfers
From: Jeff King @ 2011-11-10  7:53 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

If we have a bundle file from a previous fetch that matches
this URL, then we should resume the transfer where we left
off.

Signed-off-by: Jeff King <peff@peff.net>
---
The second half of the diff is hard to read because I re-indent a big
chunk. It's much easier to see what's going on with "diff -b".

 remote-curl.c |   64 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 6b0820e..43ad1b6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -9,6 +9,7 @@
 #include "sideband.h"
 #include "bundle.h"
 #include "progress.h"
+#include "dir.h"
 
 static struct remote *remote;
 static const char *url; /* always ends with a trailing slash */
@@ -171,6 +172,32 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options,
 	return ret;
 }
 
+static int resume_bundle(const char *url, const char *tmpname)
+{
+	struct get_refs_cb_data data;
+	int ret;
+
+	data.fh = fopen(tmpname, "ab");
+	if (!data.fh)
+		die_errno("unable to open %s", tmpname);
+
+	data.is_bundle = 1;
+	data.total = ftell(data.fh);
+	if (options.progress) {
+		data.progress = start_progress("Resuming bundle", 0);
+		display_progress(data.progress, 0);
+		display_throughput(data.progress, data.total);
+	}
+
+	ret = http_get_callback(url, get_refs_callback, &data, data.total, 0);
+
+	if (fclose(data.fh))
+		die_errno("unable to write to %s", tmpname);
+	stop_progress(&data.progress);
+
+	return ret;
+}
+
 static const char *url_to_bundle_tmpfile(const char *url)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -210,20 +237,35 @@ static int get_refs_from_url(const char *url, struct strbuf *out, int options,
 	free_discovery(last);
 
 	filename = url_to_bundle_tmpfile(url);
+	if (file_exists(filename)) {
+		struct strbuf trimmed = STRBUF_INIT;
 
-	strbuf_addf(&buffer, "%sinfo/refs", url);
-	if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) {
-		is_http = 1;
-		if (!strchr(url, '?'))
-			strbuf_addch(&buffer, '?');
-		else
-			strbuf_addch(&buffer, '&');
-		strbuf_addf(&buffer, "service=%s", service);
+		strbuf_addstr(&trimmed, url);
+		while (trimmed.len > 0 && trimmed.buf[trimmed.len-1] == '/')
+			strbuf_setlen(&trimmed, trimmed.len - 1);
+		refs_url = strbuf_detach(&trimmed, NULL);
+
+		http_ret = resume_bundle(refs_url, filename);
+		is_bundle = 1;
 	}
-	refs_url = strbuf_detach(&buffer, NULL);
+	else
+		http_ret = HTTP_MISSING_TARGET;
+
+	if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) {
+		strbuf_addf(&buffer, "%sinfo/refs", url);
+		if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) {
+			is_http = 1;
+			if (!strchr(url, '?'))
+				strbuf_addch(&buffer, '?');
+			else
+				strbuf_addch(&buffer, '&');
+			strbuf_addf(&buffer, "service=%s", service);
+		}
+		refs_url = strbuf_detach(&buffer, NULL);
 
-	http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE,
-				     filename, &is_bundle);
+		http_ret = get_refs_from_url(refs_url, &buffer, HTTP_NO_CACHE,
+					     filename, &is_bundle);
+	}
 
 	/* try again with "plain" url (no ? or & appended) */
 	if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) {
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* [PATCH 14/14] clone: give advice on how to resume a failed clone
From: Jeff King @ 2011-11-10  7:56 UTC (permalink / raw)
  To: git
In-Reply-To: <20111110074330.GA27925@sigill.intra.peff.net>

When clone fails, we usually delete the partial directory.
However, if cloning was fetching a bundle, that is resumable
and we should consider those results precious.

This patch detects when a partial bundle is present,
preserves the directory, and gives the user some advice
about how to resume.

Signed-off-by: Jeff King <peff@peff.net>
---
We could make "git clone ..." automatically resume, but I'm a little
nervous about that. I wrote a patch that did so, and it did work, but
there are a lot of little hiccups as we violate the assumption that the
directory didn't already exist (e.g., it writes multiple fetch refspec
lines to the config).

But more importantly, I really worry about destroying the safety valve
of not overwriting an existing directory. Yes, we can check to see that
it is a git directory and that it has a partially-downloaded bundle
file. But that could also describe an existing git repo with unstaged
changes. And you sure don't want to "checkout -f" over them.

So I'd rather at least start with giving the user some advice and
having them explicitly say "yeah, I do want to resume this".

 builtin/clone.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index efe8b6c..c242e20 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,36 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	return ret;
 }
 
+static int git_dir_is_resumable(const char *dir)
+{
+	const char *objects = mkpath("%s/objects", dir);
+	DIR *dh = opendir(objects);
+	struct dirent *de;
+
+	if (!dh)
+		return 0;
+
+	while ((de = readdir(dh))) {
+		if (!prefixcmp(de->d_name, "tmp_bundle_")) {
+			closedir(dh);
+			return 1;
+		}
+	}
+
+	closedir(dh);
+	return 0;
+}
+
+static void give_resume_advice(void)
+{
+	advise("Cloning failed, but partial results were saved.");
+	advise("You can resume the fetch with:");
+	advise("  git fetch");
+	if (!option_bare)
+		advise("  git checkout %s",
+		       option_branch ? option_branch : "master");
+}
+
 static const char *junk_work_tree;
 static const char *junk_git_dir;
 static pid_t junk_pid;
@@ -402,6 +432,10 @@ static void remove_junk(void)
 	if (getpid() != junk_pid)
 		return;
 	if (junk_git_dir) {
+		if (git_dir_is_resumable(junk_git_dir)) {
+			give_resume_advice();
+			return;
+		}
 		strbuf_addstr(&sb, junk_git_dir);
 		remove_dir_recursively(&sb, 0);
 		strbuf_reset(&sb);
-- 
1.7.7.2.7.g9f96f

^ permalink raw reply related

* Re: RFH: unexpected reflog behavior with --since=
From: Jeff King @ 2011-11-10  7:59 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org
In-Reply-To: <4EBB81EA.6060303@nextest.com>

On Wed, Nov 09, 2011 at 11:48:58PM -0800, Eric Raible wrote:

> > But it may also be a misfeature, because it's not clear what you're
> > actually trying to limit by. We have commit timestamps, of course, but
> > when we are walking reflogs, we also have reflog timestamps. Did you
> > actually want to say "show me all commits in the reflog, in reverse
> > reflog order, omitting commits that happened before time t"? Or did you
> > really mean "show me the reflog entries that happened before time t,
> > regardless of their commit timestamp"?
> 
> I meant "show me the reflog entries that happened *since* time t,
> regardless of their commit timestamp.

Err, yeah, sorry. Somehow in the middle of writing the email I got
turned backwards about which direction we were interested in.

But I think you get the point.

> Since -g is asking specifying for the reflog, and since the reflog has
> its own timestamps, I would expect that those timestamps be used.

Then I think my one-liner patch should do what you want. And now it's
not just anecdotal evidence that I think it's the right behavior. There
are two of us; we're _data_.

-Peff

^ permalink raw reply

* Re: RFH: unexpected reflog behavior with --since=
From: Eric Raible @ 2011-11-10  8:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org
In-Reply-To: <20111109222032.GB31535@sigill.intra.peff.net>

On 11/9/2011 2:20 PM, Jeff King wrote:
> On Wed, Nov 09, 2011 at 05:01:28PM -0500, Jeff King wrote:
> 
> This patch (which is below) turns out to be absurdly simple. And it
> actually still prints the original commit timestamp, because we end up
> reparsing it out of the commit object during the pretty-print phase.

Sweet!

> So I think the only decision is whether "--since" should respect the
> commit timestamps (and be used as a sort of "grep" filter for
> timestamps), or whether it should be respecting the fake history we
> create when doing a reflog walk.

When -g is specified it seems less surprising for --since to respect
the reflog's fake history.  That's what *I* expected, anyway.

> I think I am leaning towards the latter. It seems to me to be the more
> likely guess for what the user would want. And there is real benefit to
> doing it in git, since we can stop the traversal early. In the
> "grep-like" case, doing it inside git is not really any more efficient
> than filtering in a pipeline, like:
> 
>   git log -g --format='%ct %H' |
>   awk '{ print $2 if $1 < SOME_TIMESTAMP }'

And then the sha would have to be fed back into git to be useful, eh?

> Of course we could still offer both (with a "--reflog-since" type of
> option). We'd also need to turn off the optimization for "--since", and
> then check whether "--until" has a similar bug (and offer
> "--reflog-until").

I don't see the point of --reflog-since.  If the user specifies 'reflog'
(either directly or with -g), then can't we just use the reflog's timestamp?
Note: there might be good reasons, as my use of the reflog (and --since, for
that matter), has been very simplistic so far.

> diff --git a/reflog-walk.c b/reflog-walk.c
> index 5d81d39..2e5b270 100644
> --- a/reflog-walk.c
> +++ b/reflog-walk.c
> @@ -231,6 +231,7 @@ void fake_reflog_parent(struct reflog_walk_info *info, struct commit *commit)
>  	reflog = &commit_reflog->reflogs->items[commit_reflog->recno];
>  	info->last_commit_reflog = commit_reflog;
>  	commit_reflog->recno--;
> +	commit->date = reflog->timestamp;
>  	commit_info->commit = (struct commit *)parse_object(reflog->osha1);
>  	if (!commit_info->commit) {
>  		commit->parents = NULL;

Is this something you'd be willing to turn into a real patch?
I'm certainly not qualified.

Thanks - Eric

^ permalink raw reply

* Re: RFH: unexpected reflog behavior with --since=
From: Jeff King @ 2011-11-10  8:08 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org
In-Reply-To: <4EBB8596.6040507@nextest.com>

On Thu, Nov 10, 2011 at 12:04:38AM -0800, Eric Raible wrote:

> > I think I am leaning towards the latter. It seems to me to be the more
> > likely guess for what the user would want. And there is real benefit to
> > doing it in git, since we can stop the traversal early. In the
> > "grep-like" case, doing it inside git is not really any more efficient
> > than filtering in a pipeline, like:
> > 
> >   git log -g --format='%ct %H' |
> >   awk '{ print $2 if $1 < SOME_TIMESTAMP }'
> 
> And then the sha would have to be fed back into git to be useful, eh?

It's just illustrative. You could replace "%H" with the actual
information you're interested in.

> > Of course we could still offer both (with a "--reflog-since" type of
> > option). We'd also need to turn off the optimization for "--since", and
> > then check whether "--until" has a similar bug (and offer
> > "--reflog-until").
> 
> I don't see the point of --reflog-since.  If the user specifies 'reflog'
> (either directly or with -g), then can't we just use the reflog's timestamp?
> Note: there might be good reasons, as my use of the reflog (and --since, for
> that matter), has been very simplistic so far.

The only point would be to leave "--since" to act on the commit
timestamps, so that you don't have to resort to the external grepping I
mentioned above. However, I'm not convinced anybody even cares about
that use case.

I think the behavior you want is much more sensible.

> > diff --git a/reflog-walk.c b/reflog-walk.c
> > index 5d81d39..2e5b270 100644
> > --- a/reflog-walk.c
> > +++ b/reflog-walk.c
> > @@ -231,6 +231,7 @@ void fake_reflog_parent(struct reflog_walk_info *info, struct commit *commit)
> >  	reflog = &commit_reflog->reflogs->items[commit_reflog->recno];
> >  	info->last_commit_reflog = commit_reflog;
> >  	commit_reflog->recno--;
> > +	commit->date = reflog->timestamp;
> >  	commit_info->commit = (struct commit *)parse_object(reflog->osha1);
> >  	if (!commit_info->commit) {
> >  		commit->parents = NULL;
> 
> Is this something you'd be willing to turn into a real patch?
> I'm certainly not qualified.

Yes. We're in release freeze now, so I didn't even bother with sending
it to Junio. But also, I'd like to gather more opinions on whether the
design is the right thing (hopefully the implementation is Obviously
Correct. :) ).

-Peff

^ permalink raw reply

* Re: RFH: unexpected reflog behavior with --since=
From: Eric Raible @ 2011-11-10  8:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org
In-Reply-To: <20111110080851.GA28342@sigill.intra.peff.net>

On 11/10/2011 12:08 AM, Jeff King wrote:
>>>   git log -g --format='%ct %H' |
>>>   awk '{ print $2 if $1 < SOME_TIMESTAMP }'
>>
>> And then the sha would have to be fed back into git to be useful, eh?
> 
> It's just illustrative. You could replace "%H" with the actual
> information you're interested in.

Of course, my thinko.

> The only point would be to leave "--since" to act on the commit
> timestamps, so that you don't have to resort to the external grepping I
> mentioned above. However, I'm not convinced anybody even cares about
> that use case.
> 
> I think the behavior you want is much more sensible.

Me too!

>> Is this something you'd be willing to turn into a real patch?
>> I'm certainly not qualified.
> 
> Yes. We're in release freeze now, so I didn't even bother with sending
> it to Junio. But also, I'd like to gather more opinions on whether the
> design is the right thing (hopefully the implementation is Obviously
> Correct. :) ).

I think it's hard to argue that the current behavior (as illustrated with
my original example) makes sense.  Or that your patch is overly complicated.
But giving people time to chime in it definitely TRTTD.

- Eric

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Johan Herland @ 2011-11-10  8:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ted Ts'o, Shawn Pearce, git, James Bottomley,
	Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <7v4nydurzh.fsf@alter.siamese.dyndns.org>

On Wed, Nov 9, 2011 at 18:26, Junio C Hamano <gitster@pobox.com> wrote:
>  - "git notes" is represented as a commit that records a tree that holds
>   the entire mapping from commit to its annotations, and the only way to
>   transferr it is to send it together with its history as a whole. It
>   does not have the nice auto-following property that transfers only the
>   relevant annotations.

True. However, consider these mitigating factors:

 - The annotations in question (the "signing" of commits) are all intended to
   be merged eventually (i.e. there is no reason for a developer to (after the
   fact) sign a commit that will never end up in the public record). Therefore,
   most or all of the notes in the notes tree are already relevant, or
will become
   relevant in the near future (when the associated commits are merged).

 - Additionally, you could organize these notes into two (or more) notes trees,
   one for merged/official annotations, and one for unmerged/pending
annotations.
   Then make the relevant tools (e.g. "git merge") transfer notes from one tree
   to the other, thereby making sure that the "official" record only contains
   notes that are relevant to the merged history.

 - Finally, there's always "git notes prune" to purge annotations for commits
   that ended up never being merged.

My point is that although "notes" might end up transferring more annotations
than strictly necessary, I believe that in practice all the notes
being transferred
are already (or will soon become) relevant.

>  + "git notes" maps the commits to its annotations in the right direction;
>   the object name of an annotated object to its annotation.
>
> In the longer term, I think we would need to extend the system in the
> following way:
>
>  - Introduce a mapping machanism that can be locally used to map names of
>   the objects being annotated to names of other objects (most likely
>   blobs but there is nothing that fundamentally prevents you from
>   annotating a commit with a tree). The current "git notes" might be a
>   perfectly suitable representation of this, or it may turn out to be
>   lacking (I haven't thought things through), but the important point is
>   that this "mapping store" is _local_. fsck, repack and prune need to be
>   told that objects that store the annotation are reachable from the
>   annotated objects.

IMHO this is precisely what "git notes" does today.

>  - Introduce a protocol extension to transfer this mapping information for
>   objects being transferred in an efficient way. When "rev-list --objects
>   have..want" tells us that the receiving end (in either fetch/push
>   direction) would have an object at the end of the primary transfer
>   (note that I did not say "an object will be sent in this transfer
>   transaction"; "have" does not come into the picture), we make sure that
>   missing annotations attached to the object is also transferred, and new
>   mapping is registered at the receiving end.
>
> The detailed design for the latter needs more thought. The auto-following
> of tags works even if nothing is being fetched in the primary transfer
> (i.e. "git fetch" && "git fetch" back to back to update our origin/master
> with the master at the origin) when a new tag is added to ancient part of
> the history that leads to the master at the origin, but this is exactly
> because the sending end advertises all the available tags and the objects
> they point at so that we can tell what new tags added to an old object is
> missing from the receiving end. This obviously would not scale well when
> we have tens of thousands of objects to annotate. Perhaps an entry in the
> "mapping store" would record:
>
>  - The object name of the object being annotated;
>
>  - The object name of the annotation;
>
>  - The "timestamp", i.e. when the association between the above two was
>   made--this can be local to the repository and a simple counter would
>   do.
>
> and also maintain the last "timestamp" this repository sent annotations to
> the remote (one timestamp per remote repository). When we push, we would
> send annotations pertaining to the object reachable from what we are
> pushing (not limited by what they already have, as the whole point of this
> exercise is to allow us to transfer annotations added to an object long
> after the object was created and sent to the remote) that is newer than
> that "timestamp". Similarly, when fetching, we would send the "timestamp"
> this repository last fetched annotations from the other end (which means
> we would need one such "timestamp" per remote repository) and let the
> remote side decide the set of new annotations they added since we last
> synched that are on objects reachable from what we "want".
>
> Or something like that.

You would also have to keep track of deleted annotations, to enable the local
side to delete an annotation corresponding to an already-deleted annotation
on the remote side.

Pretty soon, you end up having to record something similar to a DAG,
describing the history of manipulating these annotations. At that point, your
"timestamp" calculation starts to look very similar to the "have..want"
calculation already done when transferring "regular" refs. At which point you
have a system that is very similar to what "git notes" does today...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply

* Re: RFH: unexpected reflog behavior with --since=
From: Jay Soffian @ 2011-11-10  8:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Raible, git@vger.kernel.org
In-Reply-To: <20111110080851.GA28342@sigill.intra.peff.net>

On Thu, Nov 10, 2011 at 3:08 AM, Jeff King <peff@peff.net> wrote:
> it to Junio. But also, I'd like to gather more opinions on whether the
> design is the right thing (hopefully the implementation is Obviously

Makes sense to me, so you're at +3.

j.

^ permalink raw reply

* Re: Problem with git-svn with limited SVN access
From: Antoine Bonavita @ 2011-11-10  8:36 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git
In-Reply-To: <201111092338.26164.trast@student.ethz.ch>

Thomas,

Thanks a lot for taking the time to look at my problem.

On 11/09/2011 11:38 PM, Thomas Rast wrote:
> Antoine Bonavita wrote:
>> ### If I try to add one of the branches manually:
>> branches = branches/XXX:refs/remotes/branches/XXX
>>   >  git svn fetch
>> One '*' is needed in glob: 'branches/XXX'
>
> I think having several fetch specs should do the trick, although I
> cannot easily test with actual permissions.
>
> You can start configuring the repo with
>
>    git init
>    git svn init svn://server/ -T trunk
>
> to get an initial layout.  The .git/config will look like
>
>    [svn-remote "svn"]
>            url = svn://server/
>            fetch = trunk:refs/remotes/trunk
>
> The clue is that the config says 'fetch', not 'trunk'.  Much like with
> git remotes, you can add more fetch specs along the lines of
>
>            fetch = branches/XXX:refs/remotes/svn/XXX
>
> or whatever layout you prefer.
I did what you suggested. Now my .git/config looks like:
[svn-remote "svn"]
         url = svn://server/aaa/AAA
         fetch = trunk:refs/remotes/trunk
        fetch = branches/XXX:refs/remotes/svn/XXX
  and here is the result:
 > git svn fetch
W: Item is not readable: Item is not readable at 
/usr/libexec/git-core/git-svn line 1782

Error from SVN, (220001): Item is not readable: Item is not readable

Any other idea ?
Or is there a way to get more debug info from git-svn. May be it would 
give us some insight on what it is trying to do and failing to.

Thanks,

Antoine.

>
> Please tell us whether that works even in the face of restrictions on
> branches/ itself :-)
>

-- 
Antoine Bonavita (antoine@stickyadstv.com)
Envoyé de mon PC. Moi je suis Fedora.

^ permalink raw reply

* [PATCH] mktree: fix a memory leak in write_tree()
From: Liu Yuan @ 2011-11-10  8:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

From: Liu Yuan <tailai.ly@taobao.com>

We forget to call strbuf_release to release the buf memory.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
 builtin/mktree.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 098395f..4ae1c41 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -60,6 +60,7 @@ static void write_tree(unsigned char *sha1)
 	}
 
 	write_sha1_file(buf.buf, buf.len, tree_type, sha1);
+	strbuf_release(&buf);
 }
 
 static const char *mktree_usage[] = {
-- 
1.7.6.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox