Git development

Git development
 help / color / mirror / Atom feed

* [PATCHv2] Improve use of select in http backend
From: Mika Fischer @ 2011-11-02 20:21 UTC (permalink / raw)
  To: git; +Cc: gitster, daniel

I've split my previous patch into two, since the two parts actually do
different things.

The first patch uses curl_multi_fdset to use the file descriptors of the http
connections in the call to select, in effect waking up immediately when new
data arrives.

The second patch uses curl_multi_timeout (if available) to get a better
recommendation for how long to sleep from curl instead of always sleeping
for 50ms.

^ permalink raw reply

* [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Mika Fischer @ 2011-11-02 20:21 UTC (permalink / raw)
  To: git; +Cc: gitster, daniel, Mika Fischer
In-Reply-To: <1320265288-12647-1-git-send-email-mika.fischer@zoopnet.de>

Instead of sleeping unconditionally for a 50ms, when no data can be read
from the http connection(s), use curl_multi_fdset to obtain the actual
file descriptors of the open connections and use them in the select call.
This way, the 50ms sleep is interrupted when new data arrives.
---
 http.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/http.c b/http.c
index a4bc770..ae92318 100644
--- a/http.c
+++ b/http.c
@@ -664,14 +664,14 @@ void run_active_slot(struct active_request_slot *slot)
 		}
 
 		if (slot->in_use && !data_received) {
-			max_fd = 0;
+			max_fd = -1;
 			FD_ZERO(&readfds);
 			FD_ZERO(&writefds);
 			FD_ZERO(&excfds);
+			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
 			select_timeout.tv_sec = 0;
 			select_timeout.tv_usec = 50000;
-			select(max_fd, &readfds, &writefds,
-			       &excfds, &select_timeout);
+			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
 #else
-- 
1.7.8.rc0.33.g09c6f.dirty

^ permalink raw reply related

* [PATCH 2/2] http.c: Use timeout suggested by curl instead of fixed 50ms timeout
From: Mika Fischer @ 2011-11-02 20:21 UTC (permalink / raw)
  To: git; +Cc: gitster, daniel, Mika Fischer
In-Reply-To: <1320265288-12647-1-git-send-email-mika.fischer@zoopnet.de>

Recent versions of curl can suggest a period of time the library user
should sleep and try again, when curl is blocked on reading or writing
(or connecting). Use this timeout instead of always sleeping for 50ms.
---
 http.c |   22 ++++++++++++++++++++--
 1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/http.c b/http.c
index ae92318..e91a2ab 100644
--- a/http.c
+++ b/http.c
@@ -649,6 +649,9 @@ void run_active_slot(struct active_request_slot *slot)
 	fd_set excfds;
 	int max_fd;
 	struct timeval select_timeout;
+#if LIBCURL_VERSION_NUM >= 0x070f04
+	long curl_timeout;
+#endif
 	int finished = 0;
 
 	slot->finished = &finished;
@@ -664,13 +667,28 @@ void run_active_slot(struct active_request_slot *slot)
 		}
 
 		if (slot->in_use && !data_received) {
+#if LIBCURL_VERSION_NUM >= 0x070f04
+			curl_multi_timeout(curlm, &curl_timeout);
+			if (curl_timeout == 0) {
+				continue;
+			} else if (curl_timeout == -1) {
+				select_timeout.tv_sec  = 0;
+				select_timeout.tv_usec = 50000;
+			} else {
+				select_timeout.tv_sec  =  curl_timeout / 1000;
+				select_timeout.tv_usec = (curl_timeout % 1000) * 1000;
+			}
+#else
+			select_timeout.tv_sec  = 0;
+			select_timeout.tv_usec = 50000;
+#endif
+
 			max_fd = -1;
 			FD_ZERO(&readfds);
 			FD_ZERO(&writefds);
 			FD_ZERO(&excfds);
 			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
-			select_timeout.tv_sec = 0;
-			select_timeout.tv_usec = 50000;
+
 			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
-- 
1.7.8.rc0.33.g09c6f.dirty

^ permalink raw reply related

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Jeff King @ 2011-11-02 20:32 UTC (permalink / raw)
  To: Mika Fischer; +Cc: git, gitster, daniel
In-Reply-To: <1320265288-12647-2-git-send-email-mika.fischer@zoopnet.de>

On Wed, Nov 02, 2011 at 09:21:27PM +0100, Mika Fischer wrote:

> Instead of sleeping unconditionally for a 50ms, when no data can be read
> from the http connection(s), use curl_multi_fdset to obtain the actual
> file descriptors of the open connections and use them in the select call.
> This way, the 50ms sleep is interrupted when new data arrives.
> ---
>  http.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/http.c b/http.c
> index a4bc770..ae92318 100644
> --- a/http.c
> +++ b/http.c
> @@ -664,14 +664,14 @@ void run_active_slot(struct active_request_slot *slot)
>  		}
>  
>  		if (slot->in_use && !data_received) {
> -			max_fd = 0;
> +			max_fd = -1;
>  			FD_ZERO(&readfds);
>  			FD_ZERO(&writefds);
>  			FD_ZERO(&excfds);
> +			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
>  			select_timeout.tv_sec = 0;
>  			select_timeout.tv_usec = 50000;
> -			select(max_fd, &readfds, &writefds,
> -			       &excfds, &select_timeout);
> +			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
>  		}
>  	}

Do we still need to care about data_received?

My understanding was that the code was originally trying to do:

  1. Call curl, maybe get some data.

  2. If we got data, then ask curl against immediately for some data.

  3. Otherwise, sleep 50ms and then ask curl again.

But now that we are actually selecting on the proper descriptors, it
should now be safe to just do:

  1. Call curl, maybe get some data.

  2. Call select, which will wake immediately if curl is going to get
     data.

At least that's my reading. I am working on unrelated patches that clean
up the handling of data_received, but if it could go away altogether,
that would be even simpler.

-Peff

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Jeff King @ 2011-11-02 20:35 UTC (permalink / raw)
  To: Mika Fischer; +Cc: git, gitster, daniel
In-Reply-To: <20111102203221.GB5628@sigill.intra.peff.net>

On Wed, Nov 02, 2011 at 04:32:21PM -0400, Jeff King wrote:

> At least that's my reading. I am working on unrelated patches that clean
> up the handling of data_received, but if it could go away altogether,
> that would be even simpler.

That patch, btw, looks like this:

-- >8 --
Subject: [PATCH] http: remove "local" member from slot struct

The curl-multi http code does something like this:

  while (!finished) {
	  try_to_read_from_slots();
	  if (!data_received)
		  wait_for_50_ms();
  }

Which is horrible enough in itself, because of the
hard-coded 50ms wait.

But there's some additional complexity: the method for
finding whether we received data is to actually run ftell()
on the file we are writing to to see if it got anything. So
the "local" member of the slot struct contains the FILE
pointer. Except that sometimes we don't have a file, because
we're writing to a strbuf. In that case, since curl calls
our custom callback, we just increment the data_received
flag when curl gives data to our callback.

Let's do the same thing for the write-to-file case as we do
for the write-to-strbuf case: use a thin wrapper callback
and increment the received flag. This makes both methods
consistent with each other, and saves us from managing the
"local" struct member at all, reducing the code size.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c |   23 +++++++----------------
 http.h |    1 -
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/http.c b/http.c
index a4bc770..99386ef 100644
--- a/http.c
+++ b/http.c
@@ -93,6 +93,12 @@ curlioerr ioctl_buffer(CURL *handle, int cmd, void *clientp)
 }
 #endif
 
+size_t fwrite_file(char *ptr, size_t eltsize, size_t nmemb, void *handle)
+{
+	data_received++;
+	return fwrite(ptr, eltsize, nmemb, handle);
+}
+
 size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
 {
 	size_t size = eltsize * nmemb;
@@ -538,7 +544,6 @@ struct active_request_slot *get_active_slot(void)
 
 	active_requests++;
 	slot->in_use = 1;
-	slot->local = NULL;
 	slot->results = NULL;
 	slot->finished = NULL;
 	slot->callback_data = NULL;
@@ -642,8 +647,6 @@ void step_active_slots(void)
 void run_active_slot(struct active_request_slot *slot)
 {
 #ifdef USE_CURL_MULTI
-	long last_pos = 0;
-	long current_pos;
 	fd_set readfds;
 	fd_set writefds;
 	fd_set excfds;
@@ -656,13 +659,6 @@ void run_active_slot(struct active_request_slot *slot)
 		data_received = 0;
 		step_active_slots();
 
-		if (!data_received && slot->local != NULL) {
-			current_pos = ftell(slot->local);
-			if (current_pos > last_pos)
-				data_received++;
-			last_pos = current_pos;
-		}
-
 		if (slot->in_use && !data_received) {
 			max_fd = 0;
 			FD_ZERO(&readfds);
@@ -818,13 +814,12 @@ static int http_request(const char *url, void *result, int target, int options)
 		if (target == HTTP_REQUEST_FILE) {
 			long posn = ftell(result);
 			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
-					 fwrite);
+					 fwrite_file);
 			if (posn > 0) {
 				strbuf_addf(&buf, "Range: bytes=%ld-", posn);
 				headers = curl_slist_append(headers, buf.buf);
 				strbuf_reset(&buf);
 			}
-			slot->local = result;
 		} else
 			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
 					 fwrite_buffer);
@@ -871,7 +866,6 @@ static int http_request(const char *url, void *result, int target, int options)
 		ret = HTTP_START_FAILED;
 	}
 
-	slot->local = NULL;
 	curl_slist_free_all(headers);
 	strbuf_release(&buf);
 
@@ -1066,7 +1060,6 @@ void release_http_pack_request(struct http_pack_request *preq)
 	if (preq->packfile != NULL) {
 		fclose(preq->packfile);
 		preq->packfile = NULL;
-		preq->slot->local = NULL;
 	}
 	if (preq->range_header != NULL) {
 		curl_slist_free_all(preq->range_header);
@@ -1088,7 +1081,6 @@ int finish_http_pack_request(struct http_pack_request *preq)
 
 	fclose(preq->packfile);
 	preq->packfile = NULL;
-	preq->slot->local = NULL;
 
 	lst = preq->lst;
 	while (*lst != p)
@@ -1157,7 +1149,6 @@ struct http_pack_request *new_http_pack_request(
 	}
 
 	preq->slot = get_active_slot();
-	preq->slot->local = preq->packfile;
 	curl_easy_setopt(preq->slot->curl, CURLOPT_FILE, preq->packfile);
 	curl_easy_setopt(preq->slot->curl, CURLOPT_WRITEFUNCTION, fwrite);
 	curl_easy_setopt(preq->slot->curl, CURLOPT_URL, preq->url);
diff --git a/http.h b/http.h
index 3c332a9..7429381 100644
--- a/http.h
+++ b/http.h
@@ -49,7 +49,6 @@ struct slot_results {
 
 struct active_request_slot {
 	CURL *curl;
-	FILE *local;
 	int in_use;
 	CURLcode curl_result;
 	long http_code;
-- 
1.7.7.rc3.12.g571d67

^ permalink raw reply related

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Michael J Gruber @ 2011-11-02 21:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <7v1utqjqre.fsf@alter.siamese.dyndns.org>

Junio C Hamano venit, vidit, dixit 02.11.2011 19:58:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>> The advantage of tags is that they can be added without rewriting the
>> commit, of course.
> 
> And you did neither think about the downsides of tags, nor read what
> others already explained for you?

We're just weighing things differently here, and no accusations of
"misinformation" or "not thinking" will change this.

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Junio C Hamano @ 2011-11-02 21:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFz7TeQQH3D4Tpp31cZYZoQKeK37jouo+2Kh61Wa07knfw@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> And "add a fake empty commit just for the signature" is not the answer
> either - because that is clearly inferior to the tags we already had.
>
> I dunno. Did I miss something? As far as I can tell, the signed tags
> that we've had since day one are *clearly* much better in very
> fundamental ways.

Ok, back to the drawing board (which is not a loss as I wasn't expecting
this to be in the official release in upcoming 1.7.8 anyway).

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Junio C Hamano @ 2011-11-02 21:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Mika Fischer, git, gitster, daniel
In-Reply-To: <20111102203543.GC5628@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Nov 02, 2011 at 04:32:21PM -0400, Jeff King wrote:
>
>> At least that's my reading. I am working on unrelated patches that clean
>> up the handling of data_received, but if it could go away altogether,
>> that would be even simpler.
>
> That patch, btw, looks like this:
>
> -- >8 --
> Subject: [PATCH] http: remove "local" member from slot struct
>
> The curl-multi http code does something like this:
>
>   while (!finished) {
> 	  try_to_read_from_slots();
> 	  if (!data_received)
> 		  wait_for_50_ms();
>   }
>
> ...
> Let's do the same thing for the write-to-file case as we do
> for the write-to-strbuf case: use a thin wrapper callback
> and increment the received flag. This makes both methods
> consistent with each other, and saves us from managing the
> "local" struct member at all, reducing the code size.

Looks very sensible.

^ permalink raw reply

* Re: long fsck time
From: Jeff King @ 2011-11-02 21:33 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List
In-Reply-To: <CACsJy8B=5mEWoOBkrTfmJ+p7HxqJM97zdG-k71oW81-3XxuO_Q@mail.gmail.com>

On Wed, Nov 02, 2011 at 07:10:26PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Wed, Nov 2, 2011 at 7:06 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > On git.git
> >
> > $ /usr/bin/time git fsck
> > 333.25user 4.28system 5:37.59elapsed 99%CPU (0avgtext+0avgdata
> > 420080maxresident)k
> > 0inputs+0outputs (0major+726560minor)pagefaults 0swaps
> >
> > That's really long time, perhaps we should print progress so users
> > know it's still running?
> 
> Ahh.. --verbose. Sorry for the noise. Still good to show the number of
> checked objects though.

fsck --verbose is _really_ verbose. It could probably stand to have some
progress meters sprinkled throughout. The patch below produces this on
my git.git repo:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  Verifying packs: 100% (7/7), done.
  Checking objects (pack 1/7): 100% (241/241), done.
  Checking objects (pack 2/7): 100% (176/176), done.
  Checking objects (pack 3/7): 100% (312/312), done.
  Checking objects (pack 4/7): 100% (252/252), done.
  Checking objects (pack 5/7): 100% (353/353), done.
  Checking objects (pack 6/7): 100% (375/375), done.
  Checking objects (pack 7/7): 100% (171079/171079), done.

which gives reasonably smooth progress. The longest hang is that
"Verifying pack" 7 is slow (I believe it's doing a sha1 over the whole
thing). If you really wanted to get fancy, you could probably do a
throughput meter as we sha1 the whole contents.

Patch is below. It would need --{no-,}progress support on the command
line, and to check isatty(2) before it would be acceptable.

---
diff --git a/builtin/fsck.c b/builtin/fsck.c
index df1a88b..481de4e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -11,6 +11,7 @@
 #include "fsck.h"
 #include "parse-options.h"
 #include "dir.h"
+#include "progress.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -512,15 +513,19 @@ static void get_default_heads(void)
 static void fsck_object_dir(const char *path)
 {
 	int i;
+	struct progress *progress;
 
 	if (verbose)
 		fprintf(stderr, "Checking object directory\n");
 
+	progress = start_progress("Checking object directories", 256);
 	for (i = 0; i < 256; i++) {
 		static char dir[4096];
 		sprintf(dir, "%s/%02x", path, i);
 		fsck_dir(i, dir);
+		display_progress(progress, i+1);
 	}
+	stop_progress(&progress);
 	fsck_sha1_list();
 }
 
@@ -622,19 +627,36 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	if (check_full) {
 		struct packed_git *p;
+		int i, nr_packs = 0;
+		struct progress *progress;
 
 		prepare_packed_git();
 		for (p = packed_git; p; p = p->next)
+			nr_packs++;
+
+		progress = start_progress("Verifying packs", nr_packs);
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
 			/* verify gives error messages itself */
 			verify_pack(p);
+			display_progress(progress, i);
+		}
+		stop_progress(&progress);
 
-		for (p = packed_git; p; p = p->next) {
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			char buf[32];
 			uint32_t j, num;
 			if (open_pack_index(p))
 				continue;
 			num = p->num_objects;
-			for (j = 0; j < num; j++)
+
+			snprintf(buf, sizeof(buf), "Checking objects (pack %d/%d)",
+				 i, nr_packs);
+			progress = start_progress(buf, num);
+			for (j = 0; j < num; j++) {
 				fsck_sha1(nth_packed_object_sha1(p, j));
+				display_progress(progress, j+1);
+			}
+			stop_progress(&progress);
 		}
 	}
 

^ permalink raw reply related

* Re: [ANNOUNCE] Git 1.7.7.2
From: Stefan Roas @ 2011-11-02 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linux Kernel
In-Reply-To: <7v7h3jl3kw.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

Hi Junio,

is it possible that you forgot to update the GIT-VERSION-GEN with the
release of 1.7.7.2? I stll get version 1.7.7.1 from the tarball on
http://git-scm.com/ and when building from the git repository itself.

Regards,
  Stefan

-- 
Stefan Roas                                         sroas@roath.org
Joh.-Seb.-Bach-Str. 4                            D-91083 Baiersdorf
-------------------------------------------------------------------
Key fingerprint = 557C 99BE 865B 1463 2A44  7936 C662 8970 4DA5 50B8


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-02 22:06 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <20111031090717.GA24978@elie.hsd1.il.comcast.net>

On Mon, Oct 31, 2011 at 04:07:18AM -0500, Jonathan Nieder wrote:

> Something like Jeff's "priming the well with a server-specified
> bundle" proposal[2] might be a good way to make the same trick
> transparent to clients in the future.

Yes, that is one of the use cases I hope to address. But it will require
the publisher specifying a mirror location (it's possible we could add
some kind of automagic "hit a bundler service first" config option,
though I fear that the existing small-time bundler services would
crumble under the load).

So in the general case (and in the meantime), you may have to learn to
manually prime the repo using a bundle.

I haven't started on the patches for communicating mirror sites between
the server and client, but I did just write some patches to handle "git
fetch http://host/path/to/file.bundle" automatically, which is the first
step. They need a few finishing touches and some testing, though.

> Even with that, later fetches, which grab a pack generated on the fly
> to only contain the objects not already fetched, are generally not
> resumable.  Overcoming that would presumably require larger protocol
> changes, and I don't know of anyone working on it.  (My workaround
> when in a setup where this mattered was to use the old-fashioned
> "dumb" http protocol.  It worked fine.)

My goal was for the mirror communication between client and server to be
something like:

  - if you don't have object XXXXXX, then prime with URL
    http://host/bundle1

  - if you don't have object YYYYYY, then prime with URL
    http://host/bundle2

and so forth. A cloning client would grab the first bundle, then the
second, and then hit the real repo via the git protocol. A client who
had previously cloned might have XXX, but would now grab bundle2, and
then hit the real repo.

So depending on how often the server side feels like creating new
bundles, you would get most of the changes via bundles, and then only
be getting a small number of objects via git.

The downside of cumulative fetching is that the bundles can only serve
well-known checkpoints. So if you have a timeline like this:

  t0: server publishes bundle/mirror config with one line (the XXX bit
      above)

  t1: you clone, getting the whole bundle. No waste, because you had
      nothing in the first place, and you needed everything.

  t2: you fetch again, getting N commits worth of history via the git
      protocol

  t3: server decides a lot of new objects (let's say M commits worth)
      have accumulated, and generates a new line (the YYY line).

  t4: you fetch, see that you don't yet have YYY, and grab the second
      bundle

But in t4 you grabbed a bundle containing M commits, when you already
had the first N of them. So you actually wasted bandwidth getting
objects you already had. The only benefit is that you grabbed a static
file, which is resumable.

So I suspect there is some black magic involved in deciding when to
create a new bundle, and at what tip. If you create a bundle once a
month, but include only commits up to a week ago, then people pulling
weekly will never grab the bundle, but people pulling less frequently
will get the whole month as a bundle.

A secondary issue is also that in a scheme like this, your mirror list
will grow without bound. So you'd want to periodically repack everything
into a single bundle. But then people who are fetching wouldn't want
that, as it is just an exacerbated version of the same problem above.

Which is all a roundabout way of saying that the git protocol is really
the sane way to do efficient transfers. An alternative, much simpler
scheme would be for the server to just say:

  - if you have nothing, then prime with URL http://host/bundle

And then _only_ clone would bother with checking mirrors. People doing
fetch would be expected to do it often enough that not being resumable
isn't a big deal.

-Peff

^ permalink raw reply

* Re: [PATCH] Escape file:// URL's to meet subversion SVN::Ra requirements
From: Eric Wong @ 2011-11-02 22:09 UTC (permalink / raw)
  To: Ben Walton; +Cc: Jonathan Nieder, git
In-Reply-To: <1320260449-sup-479@pinkfloyd.chass.utoronto.ca>

Ben Walton <bwalton@artsci.utoronto.ca> wrote:
> Sorry for the clumsy patch.

I don't have much time to help you fix it, but I got numerous errors on
SVN 1.6.x (svn 1.6.12).  Can you make sure things continue to work on
1.6 and earlier, also?

Maybe just enable the escaping for file:// on >= SVN 1.7

Here are the tests that failed for me:

make[1]: *** [t9100-git-svn-basic.sh] Error 1
make[1]: *** [t9103-git-svn-tracked-directory-removed.sh] Error 1
make[1]: *** [t9104-git-svn-follow-parent.sh] Error 1
make[1]: *** [t9105-git-svn-commit-diff.sh] Error 1
make[1]: *** [t9107-git-svn-migrate.sh] Error 1
make[1]: *** [t9108-git-svn-glob.sh] Error 1
make[1]: *** [t9109-git-svn-multi-glob.sh] Error 1
make[1]: *** [t9110-git-svn-use-svm-props.sh] Error 1
make[1]: *** [t9111-git-svn-use-svnsync-props.sh] Error 1
make[1]: *** [t9114-git-svn-dcommit-merge.sh] Error 1
make[1]: *** [t9116-git-svn-log.sh] Error 1
make[1]: *** [t9117-git-svn-init-clone.sh] Error 1
make[1]: *** [t9118-git-svn-funky-branch-names.sh] Error 1
make[1]: *** [t9120-git-svn-clone-with-percent-escapes.sh] Error 1
make[1]: *** [t9125-git-svn-multi-glob-branch-names.sh] Error 1
make[1]: *** [t9127-git-svn-partial-rebuild.sh] Error 1
make[1]: *** [t9128-git-svn-cmd-branch.sh] Error 1
make[1]: *** [t9130-git-svn-authors-file.sh] Error 1
make[1]: *** [t9135-git-svn-moved-branch-empty-file.sh] Error 1
make[1]: *** [t9136-git-svn-recreated-branch-empty-file.sh] Error 1
make[1]: *** [t9141-git-svn-multiple-branches.sh] Error 1
make[1]: *** [t9145-git-svn-master-branch.sh] Error 1
make[1]: *** [t9146-git-svn-empty-dirs.sh] Error 1
make[1]: *** [t9150-svk-mergetickets.sh] Error 1
make[1]: *** [t9151-svn-mergeinfo.sh] Error 1
make[1]: *** [t9153-git-svn-rewrite-uuid.sh] Error 1
make[1]: *** [t9154-git-svn-fancy-glob.sh] Error 1
make[1]: *** [t9155-git-svn-fetch-deleted-tag.sh] Error 1
make[1]: *** [t9156-git-svn-fetch-deleted-tag-2.sh] Error 1
make[1]: *** [t9157-git-svn-fetch-merge.sh] Error 1
make[1]: *** [t9159-git-svn-no-parent-mergeinfo.sh] Error 1
make[1]: *** [t9161-git-svn-mergeinfo-push.sh] Error 1

^ permalink raw reply

* Re: [ANNOUNCE] Git 1.7.7.2
From: Junio C Hamano @ 2011-11-02 22:30 UTC (permalink / raw)
  To: Stefan Roas; +Cc: git, Linux Kernel
In-Reply-To: <20111102214725.GA2860@geminga.roas.networks.roath.org>

Stefan Roas <sroas@roath.org> writes:

> is it possible that you forgot to update the GIT-VERSION-GEN with the
> release of 1.7.7.2? I stll get version 1.7.7.1 from the tarball on
> http://git-scm.com/ and when building from the git repository itself.

Probably.

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Mika Fischer @ 2011-11-02 22:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, daniel
In-Reply-To: <20111102203221.GB5628@sigill.intra.peff.net>

On Wed, Nov 2, 2011 at 21:32, Jeff King <peff@peff.net> wrote:
> Do we still need to care about data_received?
>
> My understanding was that the code was originally trying to do:
>
>  1. Call curl, maybe get some data.
>
>  2. If we got data, then ask curl against immediately for some data.
>
>  3. Otherwise, sleep 50ms and then ask curl again.

Yes, that's exactly what it did.

> But now that we are actually selecting on the proper descriptors, it
> should now be safe to just do:
>
>  1. Call curl, maybe get some data.
>
>  2. Call select, which will wake immediately if curl is going to get
>     data.

The only problem I can see is that curl_multi_fdset is not guaranteed
to return any fds. So in theory it could be possible that we don't get
fds, but we're actually reading stuff. In this case things would get
slow, because we would sleep for 50ms after every read...

However, I don't know if this is a case that actually comes up in the
real world. Maybe Daniel has some advice on this.

Best,
 Mika

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Daniel Stenberg @ 2011-11-02 22:40 UTC (permalink / raw)
  To: Mika Fischer; +Cc: Jeff King, git, gitster
In-Reply-To: <CAOs=hR+QqUpYuth8Uvi2o7pm1LO8ogO2pN7nrMchYj96Cutmww@mail.gmail.com>

On Wed, 2 Nov 2011, Mika Fischer wrote:

> The only problem I can see is that curl_multi_fdset is not guaranteed to 
> return any fds. So in theory it could be possible that we don't get fds, but 
> we're actually reading stuff. In this case things would get slow, because we 
> would sleep for 50ms after every read...
>
> However, I don't know if this is a case that actually comes up in the real 
> world. Maybe Daniel has some advice on this.

It doesn't really happen so it should be safe.

The case where no fds are returned is when libcurl cannot return a socket to 
wait for during name resolving (if your particular libcurl is built to use 
such a resolver backend - libcurl has several different ones). And during name 
resolving there won't be any data to read for the libcurl-app anyway.

-- 

  / daniel.haxx.se

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Junio C Hamano @ 2011-11-02 22:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <20111102220614.GB14108@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Which is all a roundabout way of saying that the git protocol is really
> the sane way to do efficient transfers. An alternative, much simpler
> scheme would be for the server to just say:
>
>   - if you have nothing, then prime with URL http://host/bundle
>
> And then _only_ clone would bother with checking mirrors. People doing
> fetch would be expected to do it often enough that not being resumable
> isn't a big deal.

I think that is a sensible place to start.

A more fancy conditional "If you have X then fetch this, if you have Y
fetch that, ..." sounds nice but depending on what branch you are fetching
the answer has to be different. If we were to do that, the natural place
for the server to give the redirect instruction to the client is after the
client finishes saying "want", and before the client starts saying "have".

^ permalink raw reply

* Re: git-p4: problem with commit 97a21ca50ef8
From: Michael Wookey @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Vitor Antunes; +Cc: Pete Wyckoff, Git Mailing List, Luke Diamand
In-Reply-To: <loom.20111102T153631-769@post.gmane.org>

On 3 November 2011 01:43, Vitor Antunes <vitor.hda@gmail.com> wrote:
> Michael Wookey <michaelwookey <at> gmail.com> writes:
>> Of course, I'd love to have git-p4 work seamlessly for this scenario.
>> Even Perforce have a KB article on the limitation of the "apple"
>> filetype with git-p4:
>>
>>   http://kb.perforce.com/article/1417/git-p4
>>
> """
> Step 2: Download Git-p4
>
> Recommended version is ermshiperete’s branch, which is available from:
>
> https://github.com/ermshiperete/git-p4
>
> Note: Omit the “git-p4.py25” file, which is an older version that is no longer
> needed.
> Avoid Kernel.org’s Version of Git-p4
>
> Git’s main source at http://git-scm.com/download and
> http://www.kernel.org/pub/software/scm/git/ contains an older version of Git-p4
> with limitations that ermshiperete’s branch avoids.
> """
>
> I can almost guess _who_ wrote this KB ;)
>
> But this is really frustrating. Why can't people just cooperate to make sure the
> version in the main branch is the latest?

I tried your suggested version of git-p4 (at rev 630fb678c46c) and
unfortunately, the perforce repository fails to import. Firstly, there
was a problem with importing UTF-16 encoded files, secondly the
"apple" filetype files are still skipped.

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-02 23:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <7vwrbigna7.fsf@alter.siamese.dyndns.org>

On Wed, Nov 02, 2011 at 03:41:36PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Which is all a roundabout way of saying that the git protocol is really
> > the sane way to do efficient transfers. An alternative, much simpler
> > scheme would be for the server to just say:
> >
> >   - if you have nothing, then prime with URL http://host/bundle
> >
> > And then _only_ clone would bother with checking mirrors. People doing
> > fetch would be expected to do it often enough that not being resumable
> > isn't a big deal.
> 
> I think that is a sensible place to start.

OK. That had been my original intent, but somebody (you?) mentioned the
"if you have X" thing at the GitTogether, which got me thinking.

I don't mind starting slow, as long as we don't paint ourselves into a
corner for future expansion. I'll try to design the data format for
specifying the mirror locations with that extension in mind.

Even if the bundle thing ends up too wasteful, it may still be useful to
offer a "if you don't have X, go see Y" type of mirror when "Y" is
something efficient, like git:// at a faster host (i.e., the "I built 3
commits on top of Linus" case).

> A more fancy conditional "If you have X then fetch this, if you have Y
> fetch that, ..." sounds nice but depending on what branch you are fetching
> the answer has to be different. If we were to do that, the natural place
> for the server to give the redirect instruction to the client is after the
> client finishes saying "want", and before the client starts saying "have".

Agreed. I was really trying to avoid protocol extensions, though, at
least for an initial version. I'd like to see how far we can get doing
the simplest thing.

-Peff

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Junio C Hamano @ 2011-11-02 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFx_rAA6TJkZn1Zvu6u9UjxnmTVt0HpMnvaE_q9Sx-jzPg@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I hate how anonymous our branches are. Sure, we can use good names for
> them, but it was a mistake to think we should describe the repository
> (for gitweb), rather than the branch.
>
> Ok, "hate" is a strong word. I don't "hate" it. I don't even think
> it's a major design issue. But I do think that it would have been
> nicer if we had had some branch description model.
> ...
> Maybe just verifying the email message (with the suggested kind of
> change to "git request-pull") is actually the right approach. And what
> I should do is to just wrap my "git pull" in some script that I can
> just cut-and-paste the gpg-signed thing into, and which just does the
> "gpg --verify" on it, and then does the "git pull" after that.
>
> Because in many ways, "git request-pull" is when you do want to sign
> stuff. A developer might well want to push out his stuff for some
> random internal testing (linux-next, for example), and then only later
> decide "Ok, it was all good, now I want to make it 'official' and ask
> Linus to pull it", and sign it at *that* time, rather than when
> actually pushing it out.

You keep saying cut-and-paste, but do you mind feeding the e-mail text
itself to a tool, instead of cut-and-paste?

The reason I am wondering about this is because in another topic (also in
'next') cooking there is an extended support for topic description for the
branch that states what the purpose of the topic is why the requestor
wants you to have it (this information can be set and updated with "git
branch --edit-description").

A respond-to-request-pull wrapper you would use could be:

 - Get the e-mail from the standard input;
 - Pick up the signed bits and validate the signature;
 - Perform the requested fetch; and
 - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.

and the "signed bits" could include:

   - the repository and the branch you were expected to pull;
   - the topic description.

among other things the requestor can edit when request-pull message is
prepared.

That would get us back to your "the lieutenant tip is not so special, but
the merge commit the integrator makes using that tip has the signature for
this particular pull" model.

^ permalink raw reply

* Re: t5800-*.sh: Intermittent test failures
From: Sverre Rabbelier @ 2011-11-02 23:35 UTC (permalink / raw)
  To: Alex Riesen, Ævar Arnfjörð
  Cc: Junio C Hamano, Ramsay Jones, Jeff King, GIT Mailing-list,
	Jonathan Nieder
In-Reply-To: <CALxABCbKSi-aHezjyn5wJ0-BPW1PvvaC2i9VeV7yXOf4yCdx4Q@mail.gmail.com>

Heya,

On Wed, Nov 2, 2011 at 00:02, Alex Riesen <raa.lkml@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 23:18, Junio C Hamano <gitster@pobox.com> wrote:
>> Alex Riesen <raa.lkml@gmail.com> writes:
>>
>>> On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>>>> ... these hangs *are* the failures of which I speak!  Yes, the script
>>>> doesn't get to declare a failure, but AFAIAC a hanging test (and it
>>>> isn't the same test # each time) is a failing test. :-D
>>>
>>> Was there any outcome of this discussion? I'm asking because I
>>> can reproduce this very reliably on a little server here.
>>
>> I do remember this discussion and recall seeing _no_ outcome.
>>
>> I did see the hang myself once or twice but did not and do not have a
>> reliable reproduction. I have been waiting for somebody to raise the issue
>> again ;-).
>>
>
> I think I managed to bisect it (between 1.7.6 and 1.7.7):
>
> $ git bisect start v1.7.7 v1.7.6
> ...
> $ git bisect good
> a515ebe9f1ac9bc248c12a291dc008570de505ca is the first bad commit
> commit a515ebe9f1ac9bc248c12a291dc008570de505ca
> Author: Sverre Rabbelier <srabbelier@gmail.com>
> Date:   Sat Jul 16 15:03:40 2011 +0200
>
>    transport-helper: implement marks location as capability
>
>    Now that the gitdir location is exported as an environment variable
>    this can be implemented elegantly without requiring any explicit
>    flushes nor an ad-hoc exchange of values.
>
>    Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
>    Acked-by: Jeff King <peff@peff.net>
>    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
> :100644 100644 1ed7a5651ef5a2320c56856b5a1fe784e178ab23
> e9c832bfd3da7db771cc2113027d3e590dc51d59 M      git-remote-testgit.py
> :100644 100644 0cfc9ae9059ce121b567406d7941b71cd54b961c
> 74c3122df1835c45a6b621205fb18b4fc89af366 M      transport-helper.c
>
> Sadly, I'm going to be able to repeat the test in about 20 hours.

Ævar, this seems like something we could look at during the mini
GitTogether in Amsterdam this Saturday, no?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply

* Re: Fork freedesktop project to bitbucket, make changes, generate patch back to freedesktop?
From: Jeff King @ 2011-11-02 23:37 UTC (permalink / raw)
  To: Alec Taylor; +Cc: git
In-Reply-To: <CAO+9iGeHSsJz+7=N0BzmGGbkGN1P=CyNvxJWO_1nCNjiZZzetA@mail.gmail.com>

On Sat, Oct 29, 2011 at 02:35:42PM +1100, Alec Taylor wrote:

> I am working with a team extending the functionality of this project.
> 
> After many MANY adds, commits and pushes back and forth on the
> bitbucket project, we then want to send this freedesktop project a
> PATCH with the changes we've made.
> 
> Can you tell me the command I need to do this?

Do you want to send them one patch, or a series of patches?

If one, then you probably want to diff off of some known point (either
their current branch tip, or maybe some recently released version). And
then send them the resulting diff in an email. You can just use "git
diff" for this if you want, and include it in an email, or you can
actually create a new "squashed" commit in git, like this:

  git checkout v1.0 ;# or wherever you think they would want to apply
  git merge --squash your-branch
  git commit

and then use "git format-patch" to create a patch (and optionally
git-send-email to send it).

If you want to share the whole series, you can use format-patch to
create the series, but note that a patch series can only represent a
linear history. If you have a lot of merges from pushing back and forth,
you may want to linearize it first using "git rebase -i".

That's just a high level overview of what you'll need. You can try
reading up on those commands to get a better sense of exactly how you
want to proceed, or if you have more specific questions, ask.

-Peff

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: david @ 2011-11-02 23:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <7vsjm6gkte.fsf@alter.siamese.dyndns.org>

On Wed, 2 Nov 2011, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> I hate how anonymous our branches are. Sure, we can use good names for
>> them, but it was a mistake to think we should describe the repository
>> (for gitweb), rather than the branch.
>>
>> Ok, "hate" is a strong word. I don't "hate" it. I don't even think
>> it's a major design issue. But I do think that it would have been
>> nicer if we had had some branch description model.
>> ...
>> Maybe just verifying the email message (with the suggested kind of
>> change to "git request-pull") is actually the right approach. And what
>> I should do is to just wrap my "git pull" in some script that I can
>> just cut-and-paste the gpg-signed thing into, and which just does the
>> "gpg --verify" on it, and then does the "git pull" after that.
>>
>> Because in many ways, "git request-pull" is when you do want to sign
>> stuff. A developer might well want to push out his stuff for some
>> random internal testing (linux-next, for example), and then only later
>> decide "Ok, it was all good, now I want to make it 'official' and ask
>> Linus to pull it", and sign it at *that* time, rather than when
>> actually pushing it out.
>
> You keep saying cut-and-paste, but do you mind feeding the e-mail text
> itself to a tool, instead of cut-and-paste?

think webmail (i.e. gmail), to feed the e-mail itself to a tool you either 
need to cut-n-paste the entire e-mail or you have to first save the mail 
to a text file. both of which are significantly harder than doing a 
cut-n-past of a portion of the message.

David Lang

> The reason I am wondering about this is because in another topic (also in
> 'next') cooking there is an extended support for topic description for the
> branch that states what the purpose of the topic is why the requestor
> wants you to have it (this information can be set and updated with "git
> branch --edit-description").
>
> A respond-to-request-pull wrapper you would use could be:
>
> - Get the e-mail from the standard input;
> - Pick up the signed bits and validate the signature;
> - Perform the requested fetch; and
> - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.
>
> and the "signed bits" could include:
>
>   - the repository and the branch you were expected to pull;
>   - the topic description.
>
> among other things the requestor can edit when request-pull message is
> prepared.
>
> That would get us back to your "the lieutenant tip is not so special, but
> the merge commit the integrator makes using that tip has the signature for
> this particular pull" model.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-02 23:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <7vsjm6gkte.fsf@alter.siamese.dyndns.org>

On Wed, Nov 2, 2011 at 4:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> You keep saying cut-and-paste, but do you mind feeding the e-mail text
> itself to a tool, instead of cut-and-paste?

Feeding the email to a tool is actually a fair amount of extra work.
It would have worked well in the days when I used text-based email
clients that just had a "pipe email to command" model, but that's long
gone.

In contrast, cut-and-paste to another program is easy - but then you
really can't depend on whitespace or headers or other subtle things.

> A respond-to-request-pull wrapper you would use could be:
>
>  - Get the e-mail from the standard input;
>  - Pick up the signed bits and validate the signature;
>  - Perform the requested fetch; and
>  - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.

So is there any reason this couldn't be cut-and-paste? Make the signed
part small (*not* including diffstat and shortlog), and make it
whitespace-safe, and I wouldn't mind a tool at all.

If it *can* take the whole email, that would probably be a good design
(so that a "pipe email to command"  model would still work), but it
would be much better if it doesn't require it.

> and the "signed bits" could include:
>
>   - the repository and the branch you were expected to pull;
>   - the topic description.
>
> among other things the requestor can edit when request-pull message is
> prepared.

One thing I'd like is that it would also fire up an editor for the
merge, even if it gets the topic description from the email or
cut-and-paste. I often want to fix up peoples grammar etc. That's a
separate argument for trying to keep the signed part minimal - because
 I really don't want to have to maintain spelin errors just because
they are part of what was signed..

                  Linus

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Shawn Pearce @ 2011-11-03  0:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jonathan Nieder, netroby, Git Mail List,
	Tomas Carnecky
In-Reply-To: <20111102232735.GA17466@sigill.intra.peff.net>

On Wed, Nov 2, 2011 at 16:27, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 02, 2011 at 03:41:36PM -0700, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>>
>> > Which is all a roundabout way of saying that the git protocol is really
>> > the sane way to do efficient transfers. An alternative, much simpler
>> > scheme would be for the server to just say:
>> >
>> >   - if you have nothing, then prime with URL http://host/bundle
>> >
>> > And then _only_ clone would bother with checking mirrors. People doing
>> > fetch would be expected to do it often enough that not being resumable
>> > isn't a big deal.
>>
>> I think that is a sensible place to start.

Yup, I agree. The "repo" tool used by Android does this in Python
right now[1].  Its a simple hack, if the protocol is HTTP or HTTPS the
client first tries to download $URL/clone.bundle. My servers have
rules that trap on */clone.bundle and issue an HTTP 302 Found response
to direct the client to a CDN. Works. :-)

[1] http://code.google.com/p/git-repo/source/detail?r=f322b9abb4cadc67b991baf6ba1b9f2fbd5d7812&name=stable

> OK. That had been my original intent, but somebody (you?) mentioned the
> "if you have X" thing at the GitTogether, which got me thinking.
>
> I don't mind starting slow, as long as we don't paint ourselves into a
> corner for future expansion. I'll try to design the data format for
> specifying the mirror locations with that extension in mind.

Right. Aside from the fact that $URL/clone.bundle is perhaps a bad way
to decide on the URL to actually fetch (and isn't supportable over
git:// or ssh://)... we should start with the clone case and worry
about incremental updates later.

> Even if the bundle thing ends up too wasteful, it may still be useful to
> offer a "if you don't have X, go see Y" type of mirror when "Y" is
> something efficient, like git:// at a faster host (i.e., the "I built 3
> commits on top of Linus" case).

Actually, I really think the bundle thing is wasteful. Its a ton of
additional disk. Hosts like kernel.org want to use sendfile() when
possible to handle bulk transfers. git:// is not efficient for them
because we don't have sendfile() capability.

Its also expensive for kernel.org to create each Git repository twice
on disk. The disk is cheap. Its the kernel buffer cache that is damned
expensive. Assume for a minute that Linus' kernel repository is a
popular thing to access. If 400M of that history is available in a
normal pack file on disk, and again 400M is available as a "clone
bundle thingy", kernel.org now has to eat 800M of disk buffer cache
for that one Git repository, because both of those files are going to
be hot.

I think I messed up with "repo" using a Git bundle file as its data
source. What we should have done was a bog standard pack file. Then
the client can download the pack file into the .git/objects/pack
directory and just generate the index, reusing the entire dumb
protocol transport logic. It also allows the server to pass out the
same file the server retains for the repository itself, and thus makes
the disk buffer cache only 400M for Linus' repository.

> Agreed. I was really trying to avoid protocol extensions, though, at
> least for an initial version. I'd like to see how far we can get doing
> the simplest thing.

One (maybe dumb idea I had) was making the $GIT_DIR/objects/info/packs
file contain other lines to list reference tips at the time the pack
was made. The client just needs the SHA-1s, it doesn't necessarily
need the branch names themselves. A client could initialize itself by
getting this set of references, creating temporary dummy references at
those SHA-1s, and downloading the corresponding pack file, indexing
it, then resuming with a normal fetch.

Then we wind up with a git:// or ssh:// protocol extension that
enables sendfile() on an entire pack, and to provide the matching
objects/info/packs data to help a client over git:// or ssh://
initialize off the existing pack files.

Obviously there is the existing security feature that over git:// or
ssh:// (or even smart HTTP), a deleted or rewound reference stops
exposing the content in the repository that isn't reachable from the
other reference tips. The repository owner / server administrator will
have to make a choice here, either the existing packs are not exposed
as available via sendfile() until after GC can be run to rebuild them
around the right content set, or they are exposed and the time to
expunge/hide an unreferenced object is expanded until the GC completes
(rather than being immediate after the reference updates).

But either way, I like the idea of coupling the "resumable pack
download" to the *existing* pack files, because this is easy to deal
with. If you do have a rewind/delete and need to expunge content,
users/administrators already know how to run `git gc --expire=now` to
accomplish a full erase. Adding another thing with bundle files
somewhere else that may or may not contain the data you want to erase
and remembering to clean that up is not a good idea.

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Shawn Pearce @ 2011-11-03  1:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFz7TeQQH3D4Tpp31cZYZoQKeK37jouo+2Kh61Wa07knfw@mail.gmail.com>

On Wed, Nov 2, 2011 at 13:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 1, 2011 at 2:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> But on the other hand, in many ways, publishing your commit to the outside
>> world, not necessarily for getting pulled into the final destination
>> (i.e. your tree) but merely for other people to try it out, is the point
>> of no return (aka "don't rewind or rebase once you publish").  "pushing
>> out" might be less special than "please pull", but it still is special.
>
> So I really think that signing the top commit itself is fundamentally wrong.

I really disagree. I like the signed commit approach. It allows for a
lot more workflows than just providing a way for you to validate a
pull from a trusted lieutenant. Debian/Gentoo folks want a way to sign
every commit in their workflow. Just because you don't want that and
think its crazy doesn't mean its not a valid workflow for that
community and is something Git shouldn't support. I never use `git
stash`. I hate the damn command. Yet its still there. I just choose
not to use it. Junio's gpgsig header on each commit is also optional,
and communities/contributors can choose to use (or ignore) the feature
as they need to.

> That commit may not even be *yours*. You may have pulled it from a
> sub-lieutenant as a fast-forward, or similar. Amending it later would
> be actively very very *wrong*.

Obviously you shouldn't amend a commit that would otherwise be a
fast-forward. But why not write a new empty signed commit on top, and
teach `git log` without the verify signatures flag to skip over
commits that have a gpgsig header line, have exactly one parent, and
whose parent tree matches the commit's own tree? This removes these
commits from the normal `git log` revision output, but yet the flow of
changes is still very visible within the history.

As I understand it, the point of multiple Signed-off-by lines in
commit message bodies is to show the flow of a change, who reviewed
and applied a given commit, until it finally lands in a tree where its
commit SHA-1 is frozen in stone and you can later pull it. The empty
signed commit on top of a fast-forward provides that same flow of a
change, readily visible with standard `git log` tools, but doesn't
have to clutter up history if we teach log how to skip this particular
type. Similar to the --no-merges way to skip merges. :-)

> So quite frankly, I think the stuff in pu (or next?) is completely
> mis-designed. Doing it in the commit is wrong for fundamental reasons,
> which all boil down to a simple issue:

Totally disagree. I'm really in favor of embedding these into the
commit headers the way Junio has done.

>  - you absolutely *need* to add the signature later. You *cannot* do
> it at "git commit" time.

Why can't you add it at commit time? What is stopping me from running
`git commit -S` every time I make a commit? Is it that my fingers will
wear out more quickly because I have to type my pass-phrase too often?

What is wrong with making a signed commit on a commit I have a high
level of confidence in, but not signing the others? In my own workflow
I make a lot of commit --amends  / rebases until I am pretty confident
in the code being written and organized the way I think it should be
for distribution to others. But at some point in that workflow I'm
doing an --amend or a rebase to make that last final touch, and during
that commit I can add -S to make it signed, because I'm pretty certain
its ready to go. At that point, barring some horrific bug or reviewer
comments, I am unlikely to change the commit. I know at the time I
make that commit that I am pretty confident in the commit, so I take
the extra few key strokes to sign it.

> That's a fundamental issue both from a "workflow model" issue (ie you
> want to sign stuff after it has passed testing etc,

Why do I have to wait until its tested to sign it? The gpgsig
signature isn't any more special than the Signed-off-by line I put
into my commit message to agree to the developer's certificate of
origin, nor is it any more special than the committer line in the
commit header. Its just a statement on the commit that I have a
reasonable enough confidence in the value of this particular commit
and its ancestors that I should take the time to unlock my GPG key and
sign the content in case I do distribute this to others.

If you are going to spend time testing a commit, its probably going to
take longer to perform that testing than it is to perform the GPG key
unlock and signature. So why are you complaining about the time it
takes to sign something you think is worthy of testing?  If the tests
fail, you'll need to rewind/amend/whatever to address the breakage. If
the tests pass, the commit is already signed and ready for
distribution. If you are spending a lot of time signing commits that
are highly likely to fail tests, well, maybe you should look at other
ways to improve your workflow so that you have a higher level of
confidence in the code you record and assume will be a permanent part
of the project's history.

> but you may need
> to commit it in order to *get* testing),

Maybe consider allowing a ".dirty" suffix like git-core does on
builds? Or if you are submitting the code to a remote test cluster
that auto-compiles the code for you (and that is why you need a
commit), it sounds like the time it takes for that to push, compile,
test, and report back is way higher than the time it takes to make the
signature. So you probably should only be submitting something that
you had a reasonable level of confidence in. So you should go ahead
and sign it before sending it for testing, in case the tests do pass
and you want to publish that commit.

> as well as from a
> "fundamental git datastructures" issue (ie you would want to sign
> commits that aren't yours.

Sure. But this is why you can make an empty commit and sign that.

> "git commit --amend" is not the answer - that destroys the fundamental
> concept of history being immutable, and while it works for your local
> commits, it doesn't work for anybody elses commits, or for stuff you
> already pushed out.

Nobody said you had to amend everything. You can add an empty commit.

> And "add a fake empty commit just for the signature" is not the answer
> either - because that is clearly inferior to the tags we already had.

Really? I disagree. The commit DAG scales quite well. The tag
namespace does not. A refs/signatures/$COMMIT_SHA1 namespace also does
not scale well.

An empty commit with a gpgsig header has about the same object cost as
an annotated tag once packed. But it has the advantage that the damn
thing doesn't clog up the reference space, the reference handling
code, or the advertisements in the native protocol. As history goes
on, older signatures are less relevant, and automatically are
avoided/skipped/bypassed by the normal DAG walking code. Tags don't do
this well because they have no relationship to the project history.

The only downside to an empty commit with the gpgsig header is I
cannot grab an arbitrarily deep ancestor and say "Who has signed a
commit that depends on this"? Today we already have this with git
describe --contains (aka git name-rev) for annotated tags. Its a new
feature we have to teach to some part of the log machinery, but the
algorithm will be easier because it doesn't have to mess with the
mapping table of tag objects. It just has to start digging from roots,
remembering each commit that has a gpgsig on any given branch path,
and then outputting the matches when it finds the commit in question.

The commit approach also has the advantage that your tree
automatically carries any lieutenant's signatures, by virtue of them
already being frozen in the commits.  This allows anyone downstream of
you to verify the same signatures, and check them against their own
keyring contents. If the signatures are all detached in some transient
annotated tag space, its impossible for anyone other than you to
verify pull requests. I would hate to say we have this nice
distributed version control system, but only Linus can prove the pull
requests in his repository are what they claim, and we have to then
implicitly trust you to resign that data without the original
signatures being present. $DAY_JOB would feel a lot better about the
integrity of the Linux kernel repository if _ANYONE_ can validate pull
requests offline after they have happened.

> I dunno. Did I miss something? As far as I can tell, the signed tags
> that we've had since day one are *clearly* much better in very
> fundamental ways.

Completely disagree. :-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox