Git development
 help / color / mirror / Atom feed
* Re: git-p4: problem with commit 97a21ca50ef8
From: Vitor Antunes @ 2011-11-03 11:04 UTC (permalink / raw)
  To: Michael Wookey; +Cc: Pete Wyckoff, Git Mailing List, Luke Diamand
In-Reply-To: <CAOk9v+_xaS_Y1m17TROOSjgiscT+QEJWbpZbAZFmh8_tAviF6Q@mail.gmail.com>

Hi Michael,

On Wed, Nov 2, 2011 at 10:42 PM, Michael Wookey <michaelwookey@gmail.com> wrote:
> I tried your suggested version of git-p4 (at rev 630fb678c46c) and
> unfortunately, the perforce repository fails to import. Firstly, there
> was a problem with importing UTF-16 encoded files, secondly the
> "apple" filetype files are still skipped.

I had no intention of directing you to try that version. Sorry for
misleading you on this.

I just found it interesting that P4's KB contains an article that
directs users to another version which isn't this one.

-- 
Vitor Antunes

^ permalink raw reply

* Re: [PATCH v3 2/7] invalidate_ref_cache(): take the submodule as parameter
From: Michael Haggerty @ 2011-11-03 10:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <7vmxczmrb0.fsf@alter.siamese.dyndns.org>

On 10/17/2011 08:00 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> On 10/12/2011 09:19 PM, Junio C Hamano wrote:
>>> Michael Haggerty <mhagger@alum.mit.edu> writes:
>>>  ...
>>> Probably that is what all the existing callers want, but I would have
>>> expected that an existing feature would be kept, perhaps like this
>>> instead:
>>>
>>> 	if (!submodule) {
>>> 		struct ref_cache *c;
>>>                 for (c = ref_cache; c; c = c->next)
>>>                 	clear_ref_cache(c);
>>> 	} else {
>>> 		clear_ref_cache(get_ref_cache(submodule);
>>> 	}
>> ...
>> Your specific suggestion would not work because currently
>> submodule==NULL signifies the main module.  However, it would be easy to
>> add the few-line function when/if it is needed.
> 
> I think "submodule==NULL" is probably a mistake; "" would make more sense
> given that you are storing the string in name[FLEX_ARRAY] field.

Sorry I didn't respond to this earlier.

The public API convention (which predates my changes) is that "char
*submodule" arguments either point at the relative path to the submodule
or are NULL to denote the main module.  But since these are stored
internally in a name[FLEX_ARRAY] field, I have been using "" internally
to denote the main module.  I believe that everything is done correctly,
but I admit that the use of different conventions internally and
externally is a potential source of programming errors.

If this is viewed as something that needs changing, the easiest thing
would probably be to add a "const char *submodule" in the ref_cache data
structure that either contains NULL or points at the name field, and to
consistently use the convention that the main module must always be
denoted by NULL.  The space overhead would be negligible because the
number of ref_cache objects is limited to the number of submodules plus 1.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* [PATCH] log: support --full-diff=<pathspec>
From: Nguyễn Thái Ngọc Duy @ 2011-11-03 10:01 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

I wanted to check if any patches that update builtin/fsck.c also
update 't' directory and git seemed not support this case
(true?). With this I can do

  git log --patch --full-diff="'builtin/fsck.c' 't'" -- builtin/fsck.c

I guess this may be something people find useful.

This patch is a bit inconvenient though because <pathspec> is parsed
with sq_dequote_to_argv() and all arguments must be wrapped by ''.
Also "full-diff" name does not make much sense when it comes with
pathspec.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 revision.c |   19 +++++++++++++++++++
 revision.h |    1 +
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/revision.c b/revision.c
index 8764dde..f508953 100644
--- a/revision.c
+++ b/revision.c
@@ -13,6 +13,7 @@
 #include "decorate.h"
 #include "log-tree.h"
 #include "string-list.h"
+#include "quote.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -1531,6 +1532,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--full-diff")) {
 		revs->diff = 1;
 		revs->full_diff = 1;
+	} else if (!prefixcmp(arg, "--full-diff=")) {
+		revs->diff = 1;
+		revs->full_diff = 1;
+		revs->full_diff_opt = arg + strlen("--full-diff=");
 	} else if (!strcmp(arg, "--full-history")) {
 		revs->simplify_history = 0;
 	} else if (!strcmp(arg, "--relative-date")) {
@@ -1819,6 +1824,20 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			revs->prune = 1;
 		if (!revs->full_diff)
 			diff_tree_setup_paths(revs->prune_data.raw, &revs->diffopt);
+		else if (revs->full_diff_opt) {
+			const char **argv = NULL;
+			int alloc = 0, nr = 0;
+			char *arg;
+
+			arg = xstrdup(revs->full_diff_opt);
+			sq_dequote_to_argv(arg, &argv, &nr, &alloc);
+
+			ALLOC_GROW(argv, nr + 1, alloc);
+			argv[nr] = NULL;
+			argv = get_pathspec(revs->prefix, argv);
+
+			diff_tree_setup_paths(argv, &revs->diffopt);
+		}
 	}
 	if (revs->combine_merges)
 		revs->ignore_merges = 0;
diff --git a/revision.h b/revision.h
index 6aa53d1..baa709c 100644
--- a/revision.h
+++ b/revision.h
@@ -137,6 +137,7 @@ struct rev_info {
 	const char	*subject_prefix;
 	int		no_inline;
 	int		show_log_size;
+	const char      *full_diff_opt;
 
 	/* Filter by commit log message */
 	struct grep_opt	grep_filter;
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related

* How to find a commit that introduces (not removes) a string?
From: Sebastian Schuberth @ 2011-11-03  9:50 UTC (permalink / raw)
  To: git

Hi all,

I know about git log's -S / -G, but I'm unable to make these search through *introduced* strings only. Is there a way to do so?

Thanks!

PS: I also read [1], but although the author claims to be interested in introduced strings only, he seems to be satisfied with -G, which slightly puzzles me.

[1] http://stackoverflow.com/questions/5816134/git-finding-a-commit-that-introduced-a-string

-- 
Sebastian Schuberth

^ permalink raw reply

* [PATCH] fsck: print progress
From: Nguyễn Thái Ngọc Duy @ 2011-11-03  8:50 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Nguyễn Thái Ngọc Duy
In-Reply-To: <20111103033325.GA10230@sigill.intra.peff.net>

fsck is usually a long process and it would be nice if it prints
progress from time to time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 2011/11/3 Jeff King <peff@peff.net>:
 > We're actually leaking some memory here, since stop_progress will also
 > free() the progress object and any associated resources. It's not a lot,
 > but it's kind of ugly.

 Fixed by always calling stop_progress_msg() but make sure no newline
 is printed (actually "done\n" is not)

 Also fixed "progress" initialization in fsck_object_dir()

 > Or perhaps we could even come up with a total object count
 > before starting.  I guess it would involve mapping each pack index
 > simultaneously, though by my reading of the code, I think we do that
 > anyway (the opened index is cached in the pack object).

 I think this way is better because we can count two numbers at a
 time, nr. packs and nr. objects of current pack. Total object
 (I assume you mean of all packs) may be less informative.

 Documentation/git-fsck.txt |   12 +++++++++-
 builtin/fsck.c             |   52 +++++++++++++++++++++++++++++++++++++++++--
 pack-check.c               |   11 ++++++--
 pack.h                     |    4 ++-
 progress.c                 |    2 +-
 5 files changed, 72 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index a2a508d..5245101 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -10,7 +10,8 @@ SYNOPSIS
 --------
 [verse]
 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
-	 [--[no-]full] [--strict] [--verbose] [--lost-found] [<object>*]
+	 [--[no-]full] [--strict] [--verbose] [--lost-found]
+	 [--[no-]progress] [<object>*]
 
 DESCRIPTION
 -----------
@@ -72,6 +73,15 @@ index file, all SHA1 references in .git/refs/*, and all reflogs (unless
 	a blob, the contents are written into the file, rather than
 	its object name.
 
+--progress::
+--no-progress::
+	When fsck is run in a terminal, it will show the progress.
+	These options can force progress to be shown or not
+	regardless terminal check.
++
+Progress is not shown when --verbose is used. --progress is ignored
+in this case.
+
 It tests SHA1 and general object sanity, and it does full tracking of
 the resulting reachability and everything else. It prints out any
 corruption it finds (missing or bad objects), and if you use the
diff --git a/builtin/fsck.c b/builtin/fsck.c
index df1a88b..e19b78c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -11,6 +11,7 @@
 #include "fsck.h"
 #include "parse-options.h"
 #include "dir.h"
+#include "progress.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -27,6 +28,8 @@ static const char *head_points_at;
 static int errors_found;
 static int write_lost_and_found;
 static int verbose;
+static int show_progress = -1;
+
 #define ERROR_OBJECT 01
 #define ERROR_REACHABLE 02
 
@@ -512,15 +515,20 @@ static void get_default_heads(void)
 static void fsck_object_dir(const char *path)
 {
 	int i;
+	struct progress *progress = NULL;
 
 	if (verbose)
 		fprintf(stderr, "Checking object directory\n");
 
+	if (show_progress)
+		progress = start_progress("Checking object directories", 256);
 	for (i = 0; i < 256; i++) {
 		static char dir[4096];
 		sprintf(dir, "%s/%02x", path, i);
 		fsck_dir(i, dir);
+		display_progress(progress, i+1);
 	}
+	stop_progress(&progress);
 	fsck_sha1_list();
 }
 
@@ -591,6 +599,7 @@ static struct option fsck_opts[] = {
 	OPT_BOOLEAN(0, "strict", &check_strict, "enable more strict checking"),
 	OPT_BOOLEAN(0, "lost-found", &write_lost_and_found,
 				"write dangling objects in .git/lost-found"),
+	OPT_BOOL   (0, "progress", &show_progress, "show progress"),
 	OPT_END(),
 };
 
@@ -603,6 +612,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	read_replace_refs = 0;
 
 	argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0);
+
+	if (show_progress == -1)
+		show_progress = isatty(2);
+	if (verbose)
+		show_progress = 0;
+
 	if (write_lost_and_found) {
 		check_full = 1;
 		include_reflogs = 0;
@@ -622,20 +637,51 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	if (check_full) {
 		struct packed_git *p;
+		int i, nr_packs = 0;
+		struct progress *progress = NULL;
 
 		prepare_packed_git();
+
 		for (p = packed_git; p; p = p->next)
+			nr_packs++;
+
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			if (show_progress) {
+				char buf[32];
+				snprintf(buf, sizeof(buf), "Verifying pack %d/%d",
+					 i, nr_packs);
+				if (open_pack_index(p))
+					continue;
+				progress = start_progress(buf, p->num_objects);
+			}
 			/* verify gives error messages itself */
-			verify_pack(p);
+			verify_pack(p, progress);
 
-		for (p = packed_git; p; p = p->next) {
+			if (p->next)
+				stop_progress_msg(&progress, NULL);
+		}
+		stop_progress(&progress);
+
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			char buf[64];
 			uint32_t j, num;
 			if (open_pack_index(p))
 				continue;
 			num = p->num_objects;
-			for (j = 0; j < num; j++)
+
+			snprintf(buf, sizeof(buf), "Checking objects on pack %d/%d",
+				 i, nr_packs);
+			if (show_progress)
+				progress = start_progress(buf, num);
+			for (j = 0; j < num; j++) {
 				fsck_sha1(nth_packed_object_sha1(p, j));
+				display_progress(progress, j+1);
+			}
+
+			if (p->next)
+				stop_progress_msg(&progress, NULL);
 		}
+		stop_progress(&progress);
 	}
 
 	heads = 0;
diff --git a/pack-check.c b/pack-check.c
index 0c19b6e..e30c18c 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "pack.h"
 #include "pack-revindex.h"
+#include "progress.h"
 
 struct idx_entry {
 	off_t                offset;
@@ -42,7 +43,8 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs,
 }
 
 static int verify_packfile(struct packed_git *p,
-		struct pack_window **w_curs)
+			   struct pack_window **w_curs,
+			   struct progress *progress)
 {
 	off_t index_size = p->index_size;
 	const unsigned char *index_base = p->index_data;
@@ -126,7 +128,10 @@ static int verify_packfile(struct packed_git *p,
 			break;
 		}
 		free(data);
+		if ((i & 1023) == 0)
+			display_progress(progress, i);
 	}
+	display_progress(progress, i);
 	free(entries);
 
 	return err;
@@ -155,7 +160,7 @@ int verify_pack_index(struct packed_git *p)
 	return err;
 }
 
-int verify_pack(struct packed_git *p)
+int verify_pack(struct packed_git *p, struct progress *progress)
 {
 	int err = 0;
 	struct pack_window *w_curs = NULL;
@@ -164,7 +169,7 @@ int verify_pack(struct packed_git *p)
 	if (!p->index_data)
 		return -1;
 
-	err |= verify_packfile(p, &w_curs);
+	err |= verify_packfile(p, &w_curs, progress);
 	unuse_pack(&w_curs);
 
 	return err;
diff --git a/pack.h b/pack.h
index 722a54e..572794f 100644
--- a/pack.h
+++ b/pack.h
@@ -70,10 +70,12 @@ struct pack_idx_entry {
 	off_t offset;
 };
 
+struct progress;
+
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
-extern int verify_pack(struct packed_git *);
+extern int verify_pack(struct packed_git *, struct progress *);
 extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 extern char *index_pack_lockfile(int fd);
 extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned char *);
diff --git a/progress.c b/progress.c
index 3971f49..fc96a5f 100644
--- a/progress.c
+++ b/progress.c
@@ -245,7 +245,7 @@ void stop_progress_msg(struct progress **p_progress, const char *msg)
 	if (!progress)
 		return;
 	*p_progress = NULL;
-	if (progress->last_value != -1) {
+	if (progress->last_value != -1 && msg) {
 		/* Force the last update */
 		char buf[128], *bufp;
 		size_t len = strlen(msg) + 5;
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related

* Re: Folder level Acces in git
From: Magnus Bäck @ 2011-11-03  7:17 UTC (permalink / raw)
  To: redhat1981; +Cc: git
In-Reply-To: <1320300655224-6958047.post@n2.nabble.com>

On Thursday, November 03, 2011 at 07:10 CET,
     redhat1981 <redhat1981@gmail.com> wrote:

> I am using Gitosis, Adding the gitosis conf file
> 
> [group testabc]
> writable = testabc
> members =   shrii Abhijeet premkumar
> add cgit = yes
> gitweb = yes
> 
> 
> [group testabc-readonly]
> readonly = testabc
> members =  Ganesh Shweta
> add cgit = yes
> gitweb = yes
> 
> Inside the repository, testabc let us say there are folders folder1,
> folder 2 etc, I want some users to have read/write, read or no access
> to the folder1 or folder2, Is this possible in Git, I have done it in
> SVN, Please help!!

Given Git's nature, you can't have read access restrictions on a sub-git
level (i.e. file/directory level). For basically the same reason, you
can never prevent users from making (local) commits that modify certain
paths (but you can encourage people to have local hooks to enforce such
policies). What you *can* do is install a server-side update hook that
rejects attempts to push commits that modify certain paths. If you're
willing to trade Gitosis for Gitolite, you get that feature for free.

http://book.git-scm.com/5_git_hooks.html
http://progit.org/book/ch4-8.html

-- 
Magnus Bäck                   Opinions are my own and do not necessarily
SW Configuration Manager      represent the ones of my employer, etc.
Sony Ericsson

^ permalink raw reply

* Re: [PATCH] Add options to specify snapshot file name, prefix
From: Thomas Guyot-Sionnest @ 2011-11-03  6:28 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: git
In-Reply-To: <1320298641-11149-1-git-send-email-dermoth@aei.ca>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Please disregard; i've actually managed to break the patch while playing
with it.

I will resubmit.

- --

Thomas

On 11-11-03 01:37 AM, Thomas Guyot-Sionnest wrote:
> commit b629275 implemented "smart" snapshot names and prefixes. I have
> scripts that used to rely on the old behaviour which allowed in some
> cases to have fixed prefix, and would require modifications to work with
> newer Gitweb.
> 
> This patch adds two parameters for overriding the snapshot name and
> prefix, sn and sp respectively. For example, to get a snapshot
> named "myproject.[suffix]" with no prefix one can add this query string:
>   ?sn=myproject;sp=
> 
> Signed-off-by: Thomas Guyot-Sionnest <dermoth@aei.ca>
> ---
>  gitweb/gitweb.perl |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..4820ef7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -756,6 +756,8 @@ our @cgi_param_mapping = (
>  	searchtext => "s",
>  	searchtype => "st",
>  	snapshot_format => "sf",
> +	snapshot_name => "sn",
> +	snapshot_prefix => "sp",
>  	extra_options => "opt",
>  	search_use_regexp => "sr",
>  	ctag => "by_tag",
> @@ -6684,6 +6686,12 @@ sub git_snapshot {
>  	}
>  
>  	my ($name, $prefix) = snapshot_name($project, $hash);
> +	if (defined($input_params{'snapshot_name'})) {
> +		$name = $input_params{'snapshot_name'};
> +	}
> +	if (defined($input_params{'snapshot_prefix'})) {
> +		$prefix = $input_params{'snapshot_prefix'};
> +	}
>  	my $filename = "$name$known_snapshot_formats{$format}{'suffix'}";
>  	my $cmd = quote_command(
>  		git_cmd(), 'archive',


- -- 
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6yNJcACgkQ6dZ+Kt5BchZkCgCgvJ9kAF636HYCAwQ7LoWu8ckG
8+8Anj5ocii3c7vKQ2/eVjcAvDJxsvhQ
=jGwr
-----END PGP SIGNATURE-----

^ permalink raw reply

* Folder level Acces in git
From: redhat1981 @ 2011-11-03  6:10 UTC (permalink / raw)
  To: git

Hi,

I am using Gitosis, Adding the gitosis conf file

[group testabc]
writable = testabc
members =   shrii Abhijeet premkumar
add cgit = yes
gitweb = yes


[group testabc-readonly]
readonly = testabc
members =  Ganesh Shweta
add cgit = yes
gitweb = yes

Inside the repository, testabc let us say there are folders folder1, folder
2 etc, I want some users to have read/write, read or no access to the
folder1 or folder2, Is this possible in Git, I have done it in SVN, Please
help!!

redhat1981@gmail.com


--
View this message in context: http://git.661346.n2.nabble.com/Folder-level-Acces-in-git-tp6958047p6958047.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Shawn Pearce @ 2011-11-03  4:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jonathan Nieder, netroby, Git Mail List,
	Tomas Carnecky
In-Reply-To: <20111103024248.GA9492@sigill.intra.peff.net>

On Wed, Nov 2, 2011 at 19:42, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 02, 2011 at 05:06:53PM -0700, Shawn O. Pearce wrote:
>
> I thought of doing something like that, but I wanted to be able to make
> cross-domain links. The "302 to a CDN" thing is a clever hack, but it
> requires more control of the webserver than some users might have. And
> of course it doesn't work for the "redirect to git:// on a different
> server" trick. Or redirect from "git://".

I agree. Later I said I regret this being a bundle file. I also regret
it being this $URL/clone.bundle thing. Its a reasonable quick hack in
Python for repo. Its cheap for my servers to respond 404 Not Found or
302 Found, and cheap to use the CDN. But it isn't the right solution
for git-core.

It has given us some useful information already in the context of
android.googlesource.com. It appears to work quite well for
distributing the large Android operating system. So the notion of
making packs available from another URL than the main repository, and
doing it as primarily a pack and not the native Git protocol, with a
follow-up incremental fetch to bring the client current seems to work.
 :-)

> My thought of having it in "refs/mirrors" is only slightly less hacky,
> but I think covers all of those cases. :)

Right, but this would have been a bit more work for me to code in Python. :-)

Long term this may be a better approach, because it does allow the
user to control the redirect without having full control over their
HTTP server. It also supports redirections across protocols like you
noted above. So its probably the direction we will see git-core take.

>> Actually, I really think the bundle thing is wasteful.... sendfile() capability.
>
> I didn't quite parse this. You say it is wasteful, but then indicate
> that it can use sendfile(), which is a good thing.

Apparently I was babbling. Based on what else you say, we agree. That
is good enough for me.

> However, I do agree with this:
>
>> Its also expensive for kernel.org to create each Git repository twice
>> on disk. The disk is cheap. Its the kernel buffer cache that is damned
>> expensive.
>
> Doubling the disk cache required is evil and ugly. I was hoping it
> wouldn't matter because the bundle would be hosted on some far-away CDN
> server anyway, though. But that is highly dependent on your setup. And
> it's really just glossing over the fact that you have twice as many
> servers. ;)

Right. :-)

In my opinion this is the important part. We shouldn't double the disk
usage required to support this. Most users can't afford the extra disk
cache or the extra server required to make this work well. But they
can use sendfile() on the server they have and get a lot of
improvement in clone speed due to lower system load, plus resumable
clone for the relatively stable history part.

> Another issue with packs is that they generally aren't supposed to be
> --thin on disk, whereas bundles can be. So I could point you to a
> succession of bundles. Which is maybe a feature, or maybe just makes
> things insanely complex[1].

Actually we can store --thin on disk safely.  Don't laugh until you
finish reading it through.

To build an incremental pack we modify pack-objects to construct a
completed thin pack on disk. Build up the list of objects that you
want in the thin pack, as though it were thin. Use REF_DELTA format to
reference objects that are not in this set but are delta bases. Copy
the necessary delta bases from the base pack over to the thin pack, at
the end just like it would be if received over the wire. The pack is
now self-contained like its supposed to be, but the tail of it is
redundant information.

If you cache alongside of the pack the "thin" object count, the cut
offset of the thin vs. completed bases, and the SHA-1 of the "thin"
pack, you can serve the "thin" pack by copying the header, then the
region of the file up to the cut point, and the final SHA-1. And there
are no pack file format changes involved.

:-)

Obviously this has some downside. Using REF_DELTA instead of OFS_DELTA
for the relatively small number of references from the "thin" part to
the completed part at the tail isn't a big disk space overhead. The
big overhead is storing the boundary data that served as delta bases
at the tail of this incremental pack. But we already do that when you
transfer this section of data over the network and it was more than
100 objects.

So I think we can get away with doing this. The serving repository is
in no worse state than if the owner had just pushed all of that
incremental stuff into the serving repository and it completed as a
thin pack. With only 2 packs in the serving repository (e.g. the
historical stuff that is stable, and the incremental current thin pack
+ completed bases), git gc --auto wouldn't even kick in to GC this
thing for a while *anyway*. So we already probably have a ton of
repositories in the wild that exhibit this disk layout and space
usage, and nobody has complained about it.

For a server admin or repository owner who cares about his user's
resumable clone support, carrying around a historical pack and a
single new incremental pack for say 2-3 months before repacking the
entire thing down to 1 new historical pack... the disk space and
additional completed base data is an acceptable cost. We already do
it.

Clients can figure out whether or not they should use an incremental
pack download vs the native Git protocol if the incremental pack does
like a bundle does and stores the base information alongside of it.
Actually you don't want the base (the ^ lines in a bundle), but the
immediate child of those. If the client has any of those children,
there is some chance the client has other objects in the pack and
should favor native protocol. But if the client has none of those base
children, but does have the base, it may be more efficient to download
the entire pack to bring the client current.

The problem with incremental pack updates is balancing the number of
round-trip requests against the update rate of the repository against
the polling frequency of the client. Its not an easy thing to solve.

However, we may be able to do better if the server can do a reasonably
fast concat of these thin pack slices together by writing a new object
header and computing the SHA-1 trailer as it goes. Instead of
computing actual graph connectivity, just concat packs together
between the base children and the requested tips. This probably
requires that the client ask for every branch (e.g. the typical
refs/heads/*:refs/remotes/origin/* refspec) and that branches didn't
rewind. But I think this is so common its perhaps worthwhile to look
into optimizing. But note we can do this in the native protocol at the
server side without telling the client anything, or changing the
protocol. It just isn't resumable without a bit more glue to have a
state marker available to the client. Nor does it work on a CDN
without giving the client more information. :-)

> So the sendfile() stuff would always happen over http.

I'm OK with that. I was just saying we may be able to also support
sendfile() over git:// if the repository owner / git-daemon owner
wants us to. Or if not sendfile(), a simple read-write loop that
doesn't have to look at the data, since the client will validate it
all.

> Yeah, I'm liking that idea. In reference to my [1] above, what I've
> started with is making:
>
>  git fetch http://host/foo.bundle

This should work, whether or not we use it for resumable clone. Its
just nice to have that tiny bit of extra glue to make it easy to pull
a bundle. So I'd like this too. :-)

> Pulling down a non-thin packfile makes the problem go away. We can spool
> it right into objects/pack, index it on the fly, and if all is well,
> move it into its final filename. If the transfer is interrupted, you
> drop what's been indexed so far, finish the transfer, and then re-start
> the indexing from scratch (actually, the "on the fly" would probably
> involve teaching index-pack to be clever about incrementally reading a
> partially written file, but it should be possible).

I wonder if we can teach index-pack to work with a thin pack on disk
and complete that by appending to the file, in addition to the
streaming from stdin it supports. Seems like that should be possible.
So then you could save a thin pack to a temp file on disk, and thus
could split a bundle header from its pack content, saving them into
two different temp files, allowing index-pack to avoid copying the
pack portion if its non-thin, or if its a huge thin pack.

I did think about doing this in "repo" and decided it was complex, and
not worth the effort. So we spool. 2G+ bundles. Its not the most
pleasant user experience. If I had more time, I would have tried to
split the bundle header from the pack and written the pack directly
off for index-pack to read from disk.

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  4:13 UTC (permalink / raw)
  To: Jochen Striepe
  Cc: Shawn Pearce, Junio C Hamano, git, James Bottomley, Jeff Garzik,
	Andrew Morton, linux-ide, LKML
In-Reply-To: <20111103032205.GA25888@pompeji.miese-zwerge.org>

On Wed, Nov 2, 2011 at 8:22 PM, Jochen Striepe <jochen@tolot.escape.de> wrote:
>
> It seems quite useless and leading to false conclusions in several cases
> where the merger's gpg output differs from someone's checking later on,
> e.g. when
>
>  - the signing key has been revoked in the mean time (for whatever
>   reasons)
>  - the signing key has expired
>  - the public part of the signing key is not available for the general
>   public.

So I don't think those are *big* issues. Sure, you'd want the public
key to be public for it to make any real sense to save, but on the
other hand, they *are* generally public. Yes, yes, you might have keys
that are only used - and only made public - within some particular
organization, but in that case the source code that gets signed with
those keys would tend to be private to that organization too, so..

And yes, keys get revoked or they expire, but that's still a pretty
rare event, so it doesn't really invalidate the argument that making
the original signed content available can quite often be useful - even
if it's not guaranteed to *always* be useful.

No, my main objection to saving the data is that it's ugly and it's
redundant. Sure, in practice you can check the signatures later fine
(with the rare exceptions you mention), but even when you can do it,
what's the big upside?

And there are much bigger real downsides, imho.

For example, let's say that we do eventually end up switching from
SHA1 to SHA256 in git, and we do a full re-import of the tree. Guess
what? All those signatures are now just so much garbage. Sure, you can
recreate them (create some trusted script that you agree does a 1:1
transform, and re-sign everything), but in practice you can't ever
really do that - because all those things are tied to the tree, so you
need to have *everybodys* private keys in one place to do so. And the
people who signed things initially would have to be insane to allow
that.

So I'm actually of the opinion that "internal signatures" are bad
design at a rather fundamental level.

In contrast, the "external signed tags" are fine: it's not just that
there are much fewer of them, it's that they are *independent*. So you
can easily re-generate the signed tags, because each signer can
*individually* decide to validate the newly converted tree, and sign
off on the fact that the conversion was done identically using new
external tags with signatures.

This was one of the reasons I made the signed tags work the way they
do. And it wasn't because I was extremely far-sighted and thought of
all the problems that internal signatures have - it's because monotone
had their internal signatures, and every other email on the monotone
list was about all the problems it caused.

> AFAIK gpg just gives you an error code and a message like e.g. "Key has
> expired" without stating if the key was valid _when signing the commit_.
>
> How do you plan to handle this when keeping the signature in the
> repository? Or am I overlooking something?

So see above - I just wouldn't worry about it. The possible few cases
where it would occur are dwarfed by the cases where it *doesn't*
occur, and those are the ones I'd concentrate on. They are the ones
that need to be important enough that it's even worth carrying the
random noise around.

Are they?

So I do think that there are real upsides at the *process* level where
you can use the signatures to verify that what is pulled is pulled
from the person you thought it was. I don't think anybody disputes
those advantages. But outside of that I think it gets very gray, and
there real disadvantages.

That said, I don't care *that* much. I don't mind polluting the merge
commits with information that I don't think is really worth it. So I'd
be willing to carry the signature information around, although I'd
hope to minimize it and have some sane way to hide it.

            Linus

^ permalink raw reply

* Re: [PATCH] fsck: print progress
From: Jeff King @ 2011-11-03  3:33 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1320290513-8878-1-git-send-email-pclouds@gmail.com>

On Thu, Nov 03, 2011 at 10:21:53AM +0700, Nguyen Thai Ngoc Duy wrote:

> fsck is usually a long process and it would be nice if it prints
> progress from time to time.

The output looks good to me. Code looks sane overall, with one comment:

> +		for (i = 1, p = packed_git; p; p = p->next, i++) {
> +			if (show_progress) {
> +				char buf[32];
> +				snprintf(buf, sizeof(buf), "Verifying pack %d/%d",
> +					 i, nr_packs);
> +				if (open_pack_index(p))
> +					continue;
> +				progress = start_progress(buf, p->num_objects);
> +			}
>  			/* verify gives error messages itself */
> -			verify_pack(p);
> +			verify_pack(p, progress);
> +
> +			/*
> +			 * we do not stop progress here, let the next
> +			 * progress line overwrite the current one for
> +			 * the next pack.
> +			 */
> +		}
> +		stop_progress(&progress);

We're actually leaking some memory here, since stop_progress will also
free() the progress object and any associated resources. It's not a lot,
but it's kind of ugly.

Perhaps there should be a special version of stop_progress that handles
this better? Or perhaps we could even come up with a total object count
before starting.  I guess it would involve mapping each pack index
simultaneously, though by my reading of the code, I think we do that
anyway (the opened index is cached in the pack object).

-Peff

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Jochen Striepe @ 2011-11-03  3:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Shawn Pearce, Junio C Hamano, git, James Bottomley, Jeff Garzik,
	Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFyXg32mko8TOGCfGHpr3jHBEgcKiK7HdVwq0Wez0fAs9A@mail.gmail.com>

	Hi,

On Wed, Nov 02, 2011 at 07:25:17PM -0700, Linus Torvalds wrote:
> To me, the point of the tag is so that the person doing the merge can
> verify that he merges something trusted.
> 
> However, everybody else seems to disagree, and wants that stupid
> signature to live along in the repository.

It seems quite useless and leading to false conclusions in several cases
where the merger's gpg output differs from someone's checking later on,
e.g. when

 - the signing key has been revoked in the mean time (for whatever
   reasons)
 - the signing key has expired
 - the public part of the signing key is not available for the general
   public.

AFAIK gpg just gives you an error code and a message like e.g. "Key has
expired" without stating if the key was valid _when signing the commit_.

How do you plan to handle this when keeping the signature in the
repository? Or am I overlooking something?


Thanks,
Jochen.

^ permalink raw reply

* [PATCH] fsck: print progress
From: Nguyễn Thái Ngọc Duy @ 2011-11-03  3:21 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Nguyễn Thái Ngọc Duy
In-Reply-To: <CACsJy8DN74QWYA_NzBCGgp_VdQpV5PqGMgAFUKYbOecVqw6HYQ@mail.gmail.com>

fsck is usually a long process and it would be nice if it prints
progress from time to time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-fsck.txt |   12 ++++++++-
 builtin/fsck.c             |   57 +++++++++++++++++++++++++++++++++++++++++--
 pack-check.c               |   11 ++++++--
 pack.h                     |    4 ++-
 4 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index a2a508d..5245101 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -10,7 +10,8 @@ SYNOPSIS
 --------
 [verse]
 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
-	 [--[no-]full] [--strict] [--verbose] [--lost-found] [<object>*]
+	 [--[no-]full] [--strict] [--verbose] [--lost-found]
+	 [--[no-]progress] [<object>*]
 
 DESCRIPTION
 -----------
@@ -72,6 +73,15 @@ index file, all SHA1 references in .git/refs/*, and all reflogs (unless
 	a blob, the contents are written into the file, rather than
 	its object name.
 
+--progress::
+--no-progress::
+	When fsck is run in a terminal, it will show the progress.
+	These options can force progress to be shown or not
+	regardless terminal check.
++
+Progress is not shown when --verbose is used. --progress is ignored
+in this case.
+
 It tests SHA1 and general object sanity, and it does full tracking of
 the resulting reachability and everything else. It prints out any
 corruption it finds (missing or bad objects), and if you use the
diff --git a/builtin/fsck.c b/builtin/fsck.c
index df1a88b..dc6a6fc 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -11,6 +11,7 @@
 #include "fsck.h"
 #include "parse-options.h"
 #include "dir.h"
+#include "progress.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -27,6 +28,8 @@ static const char *head_points_at;
 static int errors_found;
 static int write_lost_and_found;
 static int verbose;
+static int show_progress = -1;
+
 #define ERROR_OBJECT 01
 #define ERROR_REACHABLE 02
 
@@ -512,15 +515,20 @@ static void get_default_heads(void)
 static void fsck_object_dir(const char *path)
 {
 	int i;
+	struct progress *progress;
 
 	if (verbose)
 		fprintf(stderr, "Checking object directory\n");
 
+	if (show_progress)
+		progress = start_progress("Checking object directories", 256);
 	for (i = 0; i < 256; i++) {
 		static char dir[4096];
 		sprintf(dir, "%s/%02x", path, i);
 		fsck_dir(i, dir);
+		display_progress(progress, i+1);
 	}
+	stop_progress(&progress);
 	fsck_sha1_list();
 }
 
@@ -591,6 +599,7 @@ static struct option fsck_opts[] = {
 	OPT_BOOLEAN(0, "strict", &check_strict, "enable more strict checking"),
 	OPT_BOOLEAN(0, "lost-found", &write_lost_and_found,
 				"write dangling objects in .git/lost-found"),
+	OPT_BOOL   (0, "progress", &show_progress, "show progress"),
 	OPT_END(),
 };
 
@@ -603,6 +612,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	read_replace_refs = 0;
 
 	argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0);
+
+	if (show_progress == -1)
+		show_progress = isatty(2);
+	if (verbose)
+		show_progress = 0;
+
 	if (write_lost_and_found) {
 		check_full = 1;
 		include_reflogs = 0;
@@ -622,20 +637,56 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	if (check_full) {
 		struct packed_git *p;
+		int i, nr_packs = 0;
+		struct progress *progress = NULL;
 
 		prepare_packed_git();
+
 		for (p = packed_git; p; p = p->next)
+			nr_packs++;
+
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			if (show_progress) {
+				char buf[32];
+				snprintf(buf, sizeof(buf), "Verifying pack %d/%d",
+					 i, nr_packs);
+				if (open_pack_index(p))
+					continue;
+				progress = start_progress(buf, p->num_objects);
+			}
 			/* verify gives error messages itself */
-			verify_pack(p);
+			verify_pack(p, progress);
+
+			/*
+			 * we do not stop progress here, let the next
+			 * progress line overwrite the current one for
+			 * the next pack.
+			 */
+		}
+		stop_progress(&progress);
 
-		for (p = packed_git; p; p = p->next) {
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			char buf[64];
 			uint32_t j, num;
 			if (open_pack_index(p))
 				continue;
 			num = p->num_objects;
-			for (j = 0; j < num; j++)
+
+			snprintf(buf, sizeof(buf), "Checking objects on pack %d/%d",
+				 i, nr_packs);
+			if (show_progress)
+				progress = start_progress(buf, num);
+			for (j = 0; j < num; j++) {
 				fsck_sha1(nth_packed_object_sha1(p, j));
+				display_progress(progress, j+1);
+			}
+			/*
+			 * we do not stop progress here, let the next
+			 * progress line overwrite the current one for
+			 * the next pack.
+			 */
 		}
+		stop_progress(&progress);
 	}
 
 	heads = 0;
diff --git a/pack-check.c b/pack-check.c
index 0c19b6e..e30c18c 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "pack.h"
 #include "pack-revindex.h"
+#include "progress.h"
 
 struct idx_entry {
 	off_t                offset;
@@ -42,7 +43,8 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs,
 }
 
 static int verify_packfile(struct packed_git *p,
-		struct pack_window **w_curs)
+			   struct pack_window **w_curs,
+			   struct progress *progress)
 {
 	off_t index_size = p->index_size;
 	const unsigned char *index_base = p->index_data;
@@ -126,7 +128,10 @@ static int verify_packfile(struct packed_git *p,
 			break;
 		}
 		free(data);
+		if ((i & 1023) == 0)
+			display_progress(progress, i);
 	}
+	display_progress(progress, i);
 	free(entries);
 
 	return err;
@@ -155,7 +160,7 @@ int verify_pack_index(struct packed_git *p)
 	return err;
 }
 
-int verify_pack(struct packed_git *p)
+int verify_pack(struct packed_git *p, struct progress *progress)
 {
 	int err = 0;
 	struct pack_window *w_curs = NULL;
@@ -164,7 +169,7 @@ int verify_pack(struct packed_git *p)
 	if (!p->index_data)
 		return -1;
 
-	err |= verify_packfile(p, &w_curs);
+	err |= verify_packfile(p, &w_curs, progress);
 	unuse_pack(&w_curs);
 
 	return err;
diff --git a/pack.h b/pack.h
index 722a54e..572794f 100644
--- a/pack.h
+++ b/pack.h
@@ -70,10 +70,12 @@ struct pack_idx_entry {
 	off_t offset;
 };
 
+struct progress;
+
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
-extern int verify_pack(struct packed_git *);
+extern int verify_pack(struct packed_git *, struct progress *);
 extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 extern char *index_pack_lockfile(int fd);
 extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned char *);
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Robin H. Johnson @ 2011-11-03  3:16 UTC (permalink / raw)
  To: Jeff King, Git Mailing List
  Cc: Shawn Pearce, Linus Torvalds, Junio C Hamano, James Bottomley,
	Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <20111103025532.GB9492@sigill.intra.peff.net>

On Wed, Nov 02, 2011 at 10:55:32PM -0400,  Jeff King wrote:
> But big projects that are interested in signatures probably want to say
> more. They want to say "this developer really wrote this commit". They
> want to say "QA passed this commit". They want to say "the history up to
> here looks good". And so on.
On the Gentoo side, we've also pondered the question of:
author != committer != pusher
And how to preserve many signatures from sources.

We're on a central repo model, with some ~250 committers.

I was originally primarily after the push certificates/signed-push, and
recording that data in the notes, but that still has the problems of
third-party verification as mentioned in the thread.

If we require that the tip of every push is a signed commit via a hook,
we get knowledge of the pushers. Either your real commit itself is
signed, or you have a signed merge commit on top, or you have a signed
empty commit. In all of the cases, I can verify your signature at the
recv hook. Having signed push in this case has a benefit that you could
ship the data as a bundle, or async from the signing.

The QA value of multiple signatures per commit is also valuable, to
assert SOB WITHOUT altering the commit. I see spearce's rant and the
retort, and really think there needs to be a middle ground - some of
commits that are coming from pulls, and not getting additional SOB,
could really benefit from them being recorded (I see them on mailing
lists, but not introduced since that would break 'stable' IDs).

> But they can't say those things without binding some data to the commit
> (i.e., making a certificate saying "this commit passed QA").  Data which
> might only make sense to assert much later than the commit is written.
> 
> So you're going to need to support detached commit signatures in some
> form anyway to make everybody happy. Which isn't to say in-commit
> signatures are wrong, but they are just one tool in a toolbox.
I was proposing that Git supports _all_ of these models:
- signed commits
- signed pushes (via certs)
- whatever signed lightweight tag idea happens
- existing annotated tags

Choices. Each with their own costs and advantages.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Jeff King @ 2011-11-03  2:55 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Linus Torvalds, Junio C Hamano, git, James Bottomley, Jeff Garzik,
	Andrew Morton, linux-ide, LKML
In-Reply-To: <CAJo=hJv5nAKH_ptYSWfMvFQv0Dj+naPXK35wSzKYkfPOYsWkxg@mail.gmail.com>

On Wed, Nov 02, 2011 at 06:02:37PM -0700, Shawn O. Pearce wrote:

> > So I really think that signing the top commit itself is fundamentally wrong.
> 
> I really disagree. I like the signed commit approach. It allows for a
> lot more workflows than just providing a way for you to validate a
> pull from a trusted lieutenant. Debian/Gentoo folks want a way to sign
> every commit in their workflow. Just because you don't want that and
> think its crazy doesn't mean its not a valid workflow for that
> community and is something Git shouldn't support. I never use `git
> stash`. I hate the damn command. Yet its still there. I just choose
> not to use it. Junio's gpgsig header on each commit is also optional,
> and communities/contributors can choose to use (or ignore) the feature
> as they need to.

Stop for a minute and think about what it _means_ to sign a commit. Is
it saying "I wrote this commit?" Or "I think this commit is good?" Or "I
think all of the history leading to this is good?" It's obviously going
to be a per-project thing, but it's very constricting.  Leaving aside
all of the workflow issues Linus brought up (but which I do agree with),
think about what it would mean for Linus to fetch a commit from a
lieutenant and then sign it. Whatever it means, it can really only be
_one_ thing.

But big projects that are interested in signatures probably want to say
more. They want to say "this developer really wrote this commit". They
want to say "QA passed this commit". They want to say "the history up to
here looks good". And so on.

But they can't say those things without binding some data to the commit
(i.e., making a certificate saying "this commit passed QA").  Data which
might only make sense to assert much later than the commit is written.

So you're going to need to support detached commit signatures in some
form anyway to make everybody happy. Which isn't to say in-commit
signatures are wrong, but they are just one tool in a toolbox.

Personally, I think the only thing that makes sense to assert inside a
commit itself is that you are the author, and the author line of the key
should match the email UID of the signing key. And then anything you
want to say about _other_ people's commits (or even your own commits,
but later) should come in the form of detached signatures with some
content.

That's how signed tags work. It's not just Linus signing a commit. It's
Linus signing a binding between a commit and the statement "this is
v2.6.28". The only thing wrong with the signed tag model for more
general use is that you need some way of naming and organizing large
numbers of tags (e.g., several per commit if you have things like QA
signatures).

-Peff

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-03  2:42 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, Jonathan Nieder, netroby, Git Mail List,
	Tomas Carnecky
In-Reply-To: <CAJo=hJtt8vjB5oU+tEabN2AS7c-24bMHNwQSoWtZYtjjrR3d7Q@mail.gmail.com>

On Wed, Nov 02, 2011 at 05:06:53PM -0700, Shawn O. Pearce wrote:

> Yup, I agree. The "repo" tool used by Android does this in Python
> right now[1].  Its a simple hack, if the protocol is HTTP or HTTPS the
> client first tries to download $URL/clone.bundle. My servers have
> rules that trap on */clone.bundle and issue an HTTP 302 Found response
> to direct the client to a CDN. Works. :-)

I thought of doing something like that, but I wanted to be able to make
cross-domain links. The "302 to a CDN" thing is a clever hack, but it
requires more control of the webserver than some users might have. And
of course it doesn't work for the "redirect to git:// on a different
server" trick. Or redirect from "git://".

My thought of having it in "refs/mirrors" is only slightly less hacky,
but I think covers all of those cases. :)

> > Even if the bundle thing ends up too wasteful, it may still be useful to
> > offer a "if you don't have X, go see Y" type of mirror when "Y" is
> > something efficient, like git:// at a faster host (i.e., the "I built 3
> > commits on top of Linus" case).
> 
> Actually, I really think the bundle thing is wasteful. Its a ton of
> additional disk. Hosts like kernel.org want to use sendfile() when
> possible to handle bulk transfers. git:// is not efficient for them
> because we don't have sendfile() capability.

I didn't quite parse this. You say it is wasteful, but then indicate
that it can use sendfile(), which is a good thing.

However, I do agree with this:

> Its also expensive for kernel.org to create each Git repository twice
> on disk. The disk is cheap. Its the kernel buffer cache that is damned
> expensive. Assume for a minute that Linus' kernel repository is a
> popular thing to access. If 400M of that history is available in a
> normal pack file on disk, and again 400M is available as a "clone
> bundle thingy", kernel.org now has to eat 800M of disk buffer cache
> for that one Git repository, because both of those files are going to
> be hot.

Doubling the disk cache required is evil and ugly. I was hoping it
wouldn't matter because the bundle would be hosted on some far-away CDN
server anyway, though. But that is highly dependent on your setup. And
it's really just glossing over the fact that you have twice as many
servers. ;)

> I think I messed up with "repo" using a Git bundle file as its data
> source. What we should have done was a bog standard pack file. Then
> the client can download the pack file into the .git/objects/pack
> directory and just generate the index, reusing the entire dumb
> protocol transport logic. It also allows the server to pass out the
> same file the server retains for the repository itself, and thus makes
> the disk buffer cache only 400M for Linus' repository.

That would be cool, but what about ref tips? The pack is just a big blob
of objects, but we need ref tips to advertise to the server when we come
back via the smart protocol. We can make a guess about them, obviously,
but it would be nice to communicate them. I guess the mirror data could
include the tips and a pointer to a pack file.

Another issue with packs is that they generally aren't supposed to be
--thin on disk, whereas bundles can be. So I could point you to a
succession of bundles. Which is maybe a feature, or maybe just makes
things insanely complex[1].

> One (maybe dumb idea I had) was making the $GIT_DIR/objects/info/packs
> file contain other lines to list reference tips at the time the pack
> was made.

So yeah, that's another solution to the ref tip thingy, and that would
work. I don't think it would make a big difference whether the tips were
in the "mirror" file, or alongside the packfile. The latter I guess
might make administration easier. The "real" repo points its mirror one
time to a static pack store, and then the client goes and grabs whatever
it can from that store.

> Then we wind up with a git:// or ssh:// protocol extension that
> enables sendfile() on an entire pack, and to provide the matching
> objects/info/packs data to help a client over git:// or ssh://
> initialize off the existing pack files.

I think we can get around this by pointing git:// clients, either via
protocol extension or via a magic ref, to an http pack store. Sure, it's
an extra TCP connection, but that's not a big deal compared to doing an
initial clone of most repos.

So the sendfile() stuff would always happen over http.

> But either way, I like the idea of coupling the "resumable pack
> download" to the *existing* pack files, because this is easy to deal
> with.

Yeah, I'm liking that idea. In reference to my [1] above, what I've
started with is making:

  git fetch http://host/foo.bundle

work automatically. And it does work. But it actually spools the bundle
to disk and then unpacks from it, rather than placing it right into the
objects/pack directory. I did this because:

  1. We have to feed it to "index-pack --fix-thin", because bundles can
     be thin. So they're not suitable for sticking right into the pack
     directory.

  2. We could feed it straight to an index-pack pipe, but then we don't
     have a byte-for-byte file on disk to resume an interrupted
     transfer.

But spooling sucks, of course. It means we use twice as much disk space
during the index-pack as we would otherwise need to, not to mention the
latency of not starting the index-pack until we get the whole file.

Pulling down a non-thin packfile makes the problem go away. We can spool
it right into objects/pack, index it on the fly, and if all is well,
move it into its final filename. If the transfer is interrupted, you
drop what's been indexed so far, finish the transfer, and then re-start
the indexing from scratch (actually, the "on the fly" would probably
involve teaching index-pack to be clever about incrementally reading a
partially written file, but it should be possible).

-Peff

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  2:31 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CAJo=hJsXvSyB65KBp8sfciT=h5uZSqSUdxkpWtZJRtr4hXAh5A@mail.gmail.com>

On Wed, Nov 2, 2011 at 7:14 PM, Shawn Pearce <spearce@spearce.org> wrote:
>
> <rant>

I'm answering this separately, because it's a separate rant.

It's also totally bogus, but whatever.

> Get over it. Add the fucking empty commit to show the flow of a
> change. Stop forcing every fucking contributor to rebase/rewrite his
> commits just so someone higher up in the food chain can wank with
> their SOB line.

Shawn, stop using whatever drugs you are using.

NOBODY EVER REBASES ANYTHING FOR SIGNED-OFF-BY.

If they do, they are doing things very very wrong.

Signed-off-by: is *purely* for sending patches by email. No git
operations involved. None. Nada. Zilch. No rebasing involved, because
there's not even a git repository involved, for chissake!

Once something is in git, it's not signed off on - there should be a
sign-off-chain from the author to the committer, and that's it.
Anything else would be crazy.

So stop the crazy rants. Stop with the bad drugs. Seriously. You're
acting crazy.

                          Linus

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  2:25 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CAJo=hJsXvSyB65KBp8sfciT=h5uZSqSUdxkpWtZJRtr4hXAh5A@mail.gmail.com>

On Wed, Nov 2, 2011 at 7:14 PM, Shawn Pearce <spearce@spearce.org> wrote:
>
> So you propose we put the tag contents into the merge commit message
> so it can be verified after the fact? So merges are now going to be
> something much more horrific to read, because it will end with Git
> object tag cruft, the tag message, and the PGP signature spew that no
> human can decode in the head?

Actually, I wanted to just drop the damn thing.

To me, the point of the tag is so that the person doing the merge can
verify that he merges something trusted.

However, everybody else seems to disagree, and wants that stupid
signature to live along in the repository. And I can live with that,
although I do agree with you that it's not exactly pretty. I can live
with "ugly signature that I don't care for" way more than "stupid
design".

Because unlike your crazy empty commit, it at least fits the workflow,
and it certainly isn't any uglier that extraneous pointless commit.

You can disagree. You obviously do. I simply don't care. Because I'm right.

(And your claim that it's big UI fixes and protocol changes is pure
and utter garbage. I just sent a patch that cleans the code up,
removes a line that improperly drops information and gets rid of the
biggest problem with our current handling of tags. No protocol changes
involved, no big UI fixup).

                        Linus

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  2:19 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFwXu=+HdQ5nW11Ts5p-V=KgpxjyagKqB+Xv+qBOEEWXvQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Wed, Nov 2, 2011 at 6:45 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>   [torvalds@i5 linux]$ git fetch git://github.com/rustyrussell/linux.git  refs/tags/rusty@rustcorp.com.au-v3.1-8068-g5087a50

So this trivial patch removes one line of code, and makes this actually work.

However, it also makes us fail many tests that *test* that we peeled
what we fetched. However, I think the tests are wrong.

If the tag doesn't resolve into a commit, we happily output the SHA1
of the tag itself - and we say that it shouldn't be merged.

And it the tag *does* resolve into a commit, why would we output the
SHA1 of the commit? The tag should be peeled properly later when it
gets used, so peeling it here seems to be just a misfeature that makes
signed tags not work well.

So I suspect we should just apply this patch, but I didn't check
exacty what the failed tests are - except for the first one, that just
compares against a canned response (and the canned response should
just be changed). Maybe there was some reason for the peeling,
although I suspect it was just a fairly mindless case of "make it a
commit, because the merge needs the commit" - never mind that the
merge would peel it anyway.

                           Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 550 bytes --]

 builtin/fetch.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 91731b909aeb..494a7f9976f8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -436,8 +436,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 		note[note_len] = '\0';
 		fprintf(fp, "%s\t%s\t%s",
-			sha1_to_hex(commit ? commit->object.sha1 :
-				    rm->old_sha1),
+			sha1_to_hex(rm->old_sha1),
 			rm->merge ? "" : "not-for-merge",
 			note);
 		for (i = 0; i < url_len; ++i)

^ permalink raw reply related

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Shawn Pearce @ 2011-11-03  2:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFwXu=+HdQ5nW11Ts5p-V=KgpxjyagKqB+Xv+qBOEEWXvQ@mail.gmail.com>

On Wed, Nov 2, 2011 at 18:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 2, 2011 at 6:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I'm not saying that you shouldn't use them - go ahead and use the
>> feature if you like it. But please spare me your excuses for stupid
>> workarounds that come from the fact that they aren't a good match for
>> sane workflows.

We often disagree. :-)

> Btw, having now done odd things with signed tags (because we've used
> them as a side-band verification mechanism), I can certainly also say
> that the signed tags have their set of problems too.
...
> But practically, all of these issues should be pretty easily solvable.
> So it should be quite easy to make
>
>    git pull <repo> <tag-name>
>
> just do the right thing - including verifying the tag, and adding the
> information in the tag into the merge commit message.

Uhm, sure.

Quoting you 2 days ago:

On Mon, Oct 31, 2011 at 15:52, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Oct 31, 2011 at 3:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> So nobody is worried about this (quoting from my earlier message)?
>
> No, because you haven't been reading what we write.
>
> The tag is useless.
>
> The information *in* the tag is not. But it shouldn't be saved in the
> tag (or note, or whatever). Because that's just an annoying place for
> it to be, with no upside.
>
> Save it in the commit we generate. BAM! Useful, readable, permanent,
> and independently verifiable.

So you propose we put the tag contents into the merge commit message
so it can be verified after the fact? So merges are now going to be
something much more horrific to read, because it will end with Git
object tag cruft, the tag message, and the PGP signature spew that no
human can decode in the head?

Oh, right, tags are almost good enough. Elsewhere in this thread you
also stated we have to redo the way tags are signed so that the tag
message body itself is not part of the signature, allowing you to fix
spelin errors so you are not stuck with them in your commit history.
But I assume we will have to keep the more typical headers of object /
type / tag / tagger fields, as that is the key information the
signature needs to be over to be of any value. So now there will be
two different ways in which a Git annotated tag object will have its
signature created, as certainly you don't mean to remove the tag
message body from the PGP signature content for release tags.

I fail to see how shoving Git object data fields and a complete PGP
signature block into a merge commit message body, which will show by
default in all git log type tools, and exist in cherry-picks or
rebases that might make that data less valuable, is somehow better
than the gpgsig header that neatly tucks it away until requested. I
also fail to see how scraping the message body for the proper fields
in order to implement automated verification of the signature (because
no human can do it themselves and copy-paste sucks) is a good idea.
Everywhere else in Git that we have machine readable formats its very
well structured so that no guessing is required.

> So signed tags are not mis-designed from a conceptual standpoint -
> they just work really really awkwardly right now for what the kernel
> would like to do with them.
>
> With a few UI fixes, I think the signed tag thing would "just work".

Well, UI fixes, protocol changes, improvements to manage a large
reference space which we have previously said is an insane and stupid
workflow, etc. One reason you picked up all of those extra tags was
the include-tag capability kicking on and picking up older tag
history. We now have to disable it in certain cases.

Its not just a few UI fixes. And there is a lot more work to write a
verify for the tag contents+signature that appears in the body of a
merge commit message. Not to mention we now have to do that verify
logic twice, once in the signed pull request tag like but not quite a
tag but uses a tag thing you are advocating, and again for the merge
commit message body that contains the tag object data that we don't
normally show to an end user, but will now be in every merge commit
you make.

Go ahead and call me stupid, but this already is a bigger amount of
surgery to the git-core code, not to mention worse user experience for
the average `git log` reading human, than having a hidden by default
gpgsig header that might ask a contributor to take 2 extra seconds
before making a commit to consider the useful lifespan of that commit.
Or $DEITY forbid, write a new empty commit to record the equivalent of
their Signed-off-by.

Oh, and while I am on that subject...


<rant>
I have never grasped why sometimes a Signed-off-by is added to a
patch, and why sometimes its not. It seems to be this weird function
of "If the commit SHA-1 is already stable DON'T FUCKING TOUCH IT BY
ADDING SIGNED-OFF-BY IT RUINS THE HISTORY", but if you are too far
down the food chain to be fortunate enough for your commit SHA-1 to
remain frozen, the Signed-off-by has to be added to assert that the
code can be contributed. It sounds like the workflow developed around
where it wasn't acceptable to force history rewriting, you suffer by
not having the SOB, but whenever possible you force a history rewrite
on the contributor just so you can add a SOB and feel good about the
fact that the SOB is added to the commit message.

Get over it. Add the fucking empty commit to show the flow of a
change. Stop forcing every fucking contributor to rebase/rewrite his
commits just so someone higher up in the food chain can wank with
their SOB line.

Everyone I talk to that contributes code to the kernel who isn't Linus
or Ted Tso complains about this, and then asks me to fucking fix it.
They want stable SHA-1s so they know their change arrived into Linus'
tree unmolested. Unfortunately, despite their volume of changes, they
aren't high enough in the food chain to be this lucky. Nope, someone
has to wank their SOB in first. And maybe fix a spelin error.
</rant>

^ permalink raw reply

* Re: long fsck time
From: Nguyen Thai Ngoc Duy @ 2011-11-03  1:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List
In-Reply-To: <20111102213332.GA14108@sigill.intra.peff.net>

2011/11/3 Jeff King <peff@peff.net>:
> On Wed, Nov 02, 2011 at 07:10:26PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Wed, Nov 2, 2011 at 7:06 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> > On git.git
>> >
>> > $ /usr/bin/time git fsck
>> > 333.25user 4.28system 5:37.59elapsed 99%CPU (0avgtext+0avgdata
>> > 420080maxresident)k
>> > 0inputs+0outputs (0major+726560minor)pagefaults 0swaps
>> >
>> > That's really long time, perhaps we should print progress so users
>> > know it's still running?
>>
>> Ahh.. --verbose. Sorry for the noise. Still good to show the number of
>> checked objects though.
>
> fsck --verbose is _really_ verbose. It could probably stand to have some
> progress meters sprinkled throughout. The patch below produces this on
> my git.git repo:


Yes, I wanted something like this.

>  $ git fsck
>  Checking object directories: 100% (256/256), done.
>  Verifying packs: 100% (7/7), done.
>  Checking objects (pack 1/7): 100% (241/241), done.
>  Checking objects (pack 2/7): 100% (176/176), done.
>  Checking objects (pack 3/7): 100% (312/312), done.
>  Checking objects (pack 4/7): 100% (252/252), done.
>  Checking objects (pack 5/7): 100% (353/353), done.
>  Checking objects (pack 6/7): 100% (375/375), done.
>  Checking objects (pack 7/7): 100% (171079/171079), done.

Would be better if we only output one "Checking objects" line.

> which gives reasonably smooth progress. The longest hang is that
> "Verifying pack" 7 is slow (I believe it's doing a sha1 over the whole
> thing). If you really wanted to get fancy, you could probably do a
> throughput meter as we sha1 the whole contents.

I'll give it a try.

> Patch is below. It would need --{no-,}progress support on the command
> line, and to check isatty(2) before it would be acceptable.

Agreed on isatty(), though I think this output should be default (with
maybe --quiet to silence it on tty). Other messages may be prepended
with severity to indicate they are not progress output.
-- 
Duy

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  1:45 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFx0oCd6-sh0psYxho-s=sHAK0RHXJHfLewRuUcdXzxZbg@mail.gmail.com>

On Wed, Nov 2, 2011 at 6:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm not saying that you shouldn't use them - go ahead and use the
> feature if you like it. But please spare me your excuses for stupid
> workarounds that come from the fact that they aren't a good match for
> sane workflows.

Btw, having now done odd things with signed tags (because we've used
them as a side-band verification mechanism), I can certainly also say
that the signed tags have their set of problems too.

So signed tags aren't perfect. They were designed for making releases,
and that shows very clearly in how git works with them. The default
choices that git makes are very awkward indeed when you use signed
tags as "security tokens".

But unlike the "sign the commit" approach, those are implementation
and UI issues, not "fundamentally broken design" issues.

For example, fetching a single signed tag with git is surprisingly
hard. It *shouldn't* be hard - and there's no underlying technical or
design reason why it would be hard, but it is. Why? Because all the
git actions when it comes to tags are all geared towards one
particular use, that is *not* about the signature checking aspect of
them.

Here's an example: Rusty Russell now makes nice signed tags for the
things he asks me to pull, and then states them in the pull message.
So he will mention that he has a tag named

   rusty@rustcorp.com.au-v3.1-8068-g5087a50

in his git repository at

   git://github.com/rustyrussell/linux.git

and while I don't think his tag names are all that wonderful, it makes
sense from an automated script kind of standpoint.

Now, let's try to get that tag:

  [torvalds@i5 linux]$ git fetch
git://github.com/rustyrussell/linux.git
rusty@rustcorp.com.au-v3.1-8068-g5087a50
  fatal: Couldn't find remote ref rusty@rustcorp.com.au-v3.1-8068-g5087a50

oops. Ok, so his tag naming is *really* akward. Whatever. Let's try again:

   [torvalds@i5 linux]$ git fetch
git://github.com/rustyrussell/linux.git
refs/tags/rusty@rustcorp.com.au-v3.1-8068-g5087a50
   From git://github.com/rustyrussell/linux
    * tag
rusty@rustcorp.com.au-v3.1-8068-g5087a50 -> FETCH_HEAD

Ahh, success!

Oops. Nope. It turns out that git will *peel* the tag when you fetch
it, so FETCH_HEAD actually doesn't contain the tag object at all, but
the commit object that the tag pointed to. MAJOR FAIL.

Quite frankly, I think that's a git bug, but it's a git bug because
"git fetch" was designed to get the commit to merge. Fair enough.
Let's work around it, and rename the tag at the same time:

   [torvalds@i5 linux]$ git fetch
git://github.com/rustyrussell/linux.git
refs/tags/rusty@rustcorp.com.au-v3.1-8068-g5087a50:refs/tags/rusty
   From git://github.com/rustyrussell/linux
    * [new tag]
rusty@rustcorp.com.au-v3.1-8068-g5087a50 -> rusty
    * [new tag]
rusty@rustcorp.com.au-v3.1-2-gb1e4d20 ->
rusty@rustcorp.com.au-v3.1-2-gb1e4d20
    * [new tag]
rusty@rustcorp.com.au-v3.1-4896-g0acf000 ->
rusty@rustcorp.com.au-v3.1-4896-g0acf000
    * [new tag]
rusty@rustcorp.com.au-v3.1-8068-g5087a50 ->
rusty@rustcorp.com.au-v3.1-8068-g5087a50

WTF? Now we finally *did* get the tag, and we can do

   git verify-tag rusty

and that will work. But what the hell happened? We got three other
tags too that we didn't even ask for!

So we have actual git bugs here, that relate to the fact that we've
treated signed tags specially, and have magic code to basically say
"if there's a signed tag that is reachable from the thing you pull,
and you're not just doing a temporary pull into FETCH_HEAD, we'll
fetch that signed tag too".

Again - not a fundamental design mistake in the data structures, and
it actually made sense from a "signed tags are important release
points" standpoint, but it makes it *really* inconvenient to use
signed tags for signature verification.

Also, the fact that the signed tag gets peeled when we do fetch into
FETCH_HEAD also means that we can't actually save the signature in
resulting the merge commit. The merge, instead of being able to
perhaps save the information that we merged a nice trusted signed
point, only has the commit.

But practically, all of these issues should be pretty easily solvable.
So it should be quite easy to make

    git pull <repo> <tag-name>

just do the right thing - including verifying the tag, and adding the
information in the tag into the merge commit message.

So signed tags are not mis-designed from a conceptual standpoint -
they just work really really awkwardly right now for what the kernel
would like to do with them.

With a few UI fixes, I think the signed tag thing would "just work".

That said, I do think that the "signature in the pull request" should
also "just work", and I'm not entirely sure which one is better. It
might be more convenient to get the signature data from the pull
request. So I'm not at all married the the notion of using signed tags
for this.

                       Linus

^ permalink raw reply

* Re: t5800-*.sh: Intermittent test failures
From: Junio C Hamano @ 2011-11-03  1:30 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Alex Riesen, Ævar Arnfjörð, Ramsay Jones,
	Jeff King, GIT Mailing-list, Jonathan Nieder
In-Reply-To: <CAGdFq_h+Hpv9perLTU2rbdT6oZ3kZy22t5nghJQeEjNGvunL+A@mail.gmail.com>

Sverre Rabbelier <srabbelier@gmail.com> writes:

> Ævar, this seems like something we could look at during the mini
> GitTogether in Amsterdam this Saturday, no?

Have fun.

I think I happened to hit this while testing today's 'pu' that hasn't been
pushed out. The process chain looks like this:

pid  command                     stuck at
4767 sh t5800-remote-helpers.sh  wait4(-1)
 4793 git push                   read(6)
  4809 git-remote-testgit        wait4(4906)
   4906 git fast-import          wait4(4912)
    4912 git-fast-import         read(0)

lr-x------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/6 -> pipe:[133037701]
l-wx------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/7 -> pipe:[133037700]
lr-x------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/8 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:05 /proc/4809/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:05 /proc/4809/fd/1 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:05 /proc/4906/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:05 /proc/4906/fd/1 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:03 /proc/4912/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:03 /proc/4912/fd/1 -> pipe:[133037701]

So "git push (4793)" is stuck reading from pipe:[133037701], expecting the
innermost "git-fast-import (4912)" to write to it via its standard output,
but the latter is waiting to read from pipe:[133037700], hoping the former
to write to it via its fd#7.

Does this deadlock ring a bell to anybody who's involved in these
codepaths?

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  1:19 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CAJo=hJv5nAKH_ptYSWfMvFQv0Dj+naPXK35wSzKYkfPOYsWkxg@mail.gmail.com>

On Wed, Nov 2, 2011 at 6:02 PM, Shawn Pearce <spearce@spearce.org> wrote:
>>
>> So I really think that signing the top commit itself is fundamentally wrong.
>
> I really disagree. I like the signed commit approach.

If you like it so much, go ahead and use them.

But stop with the crazy excuses for the downsides. I explained exactly
why amending is stupid and wrong, and why empty commits are f*cking
moronic. But even apart from the *technical* problems with the stupid
mis-designed feature, I explained why it was fundamentally broken from
a workflow standpoint too.

I'm not saying that you shouldn't use them - go ahead and use the
feature if you like it. But please spare me your excuses for stupid
workarounds that come from the fact that they aren't a good match for
sane workflows.

                       Linus

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Shawn Pearce @ 2011-11-03  1:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFz7TeQQH3D4Tpp31cZYZoQKeK37jouo+2Kh61Wa07knfw@mail.gmail.com>

On Wed, Nov 2, 2011 at 13:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 1, 2011 at 2:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> But on the other hand, in many ways, publishing your commit to the outside
>> world, not necessarily for getting pulled into the final destination
>> (i.e. your tree) but merely for other people to try it out, is the point
>> of no return (aka "don't rewind or rebase once you publish").  "pushing
>> out" might be less special than "please pull", but it still is special.
>
> So I really think that signing the top commit itself is fundamentally wrong.

I really disagree. I like the signed commit approach. It allows for a
lot more workflows than just providing a way for you to validate a
pull from a trusted lieutenant. Debian/Gentoo folks want a way to sign
every commit in their workflow. Just because you don't want that and
think its crazy doesn't mean its not a valid workflow for that
community and is something Git shouldn't support. I never use `git
stash`. I hate the damn command. Yet its still there. I just choose
not to use it. Junio's gpgsig header on each commit is also optional,
and communities/contributors can choose to use (or ignore) the feature
as they need to.

> That commit may not even be *yours*. You may have pulled it from a
> sub-lieutenant as a fast-forward, or similar. Amending it later would
> be actively very very *wrong*.

Obviously you shouldn't amend a commit that would otherwise be a
fast-forward. But why not write a new empty signed commit on top, and
teach `git log` without the verify signatures flag to skip over
commits that have a gpgsig header line, have exactly one parent, and
whose parent tree matches the commit's own tree? This removes these
commits from the normal `git log` revision output, but yet the flow of
changes is still very visible within the history.

As I understand it, the point of multiple Signed-off-by lines in
commit message bodies is to show the flow of a change, who reviewed
and applied a given commit, until it finally lands in a tree where its
commit SHA-1 is frozen in stone and you can later pull it. The empty
signed commit on top of a fast-forward provides that same flow of a
change, readily visible with standard `git log` tools, but doesn't
have to clutter up history if we teach log how to skip this particular
type. Similar to the --no-merges way to skip merges. :-)

> So quite frankly, I think the stuff in pu (or next?) is completely
> mis-designed. Doing it in the commit is wrong for fundamental reasons,
> which all boil down to a simple issue:

Totally disagree. I'm really in favor of embedding these into the
commit headers the way Junio has done.

>  - you absolutely *need* to add the signature later. You *cannot* do
> it at "git commit" time.

Why can't you add it at commit time? What is stopping me from running
`git commit -S` every time I make a commit? Is it that my fingers will
wear out more quickly because I have to type my pass-phrase too often?

What is wrong with making a signed commit on a commit I have a high
level of confidence in, but not signing the others? In my own workflow
I make a lot of commit --amends  / rebases until I am pretty confident
in the code being written and organized the way I think it should be
for distribution to others. But at some point in that workflow I'm
doing an --amend or a rebase to make that last final touch, and during
that commit I can add -S to make it signed, because I'm pretty certain
its ready to go. At that point, barring some horrific bug or reviewer
comments, I am unlikely to change the commit. I know at the time I
make that commit that I am pretty confident in the commit, so I take
the extra few key strokes to sign it.

> That's a fundamental issue both from a "workflow model" issue (ie you
> want to sign stuff after it has passed testing etc,

Why do I have to wait until its tested to sign it? The gpgsig
signature isn't any more special than the Signed-off-by line I put
into my commit message to agree to the developer's certificate of
origin, nor is it any more special than the committer line in the
commit header. Its just a statement on the commit that I have a
reasonable enough confidence in the value of this particular commit
and its ancestors that I should take the time to unlock my GPG key and
sign the content in case I do distribute this to others.

If you are going to spend time testing a commit, its probably going to
take longer to perform that testing than it is to perform the GPG key
unlock and signature. So why are you complaining about the time it
takes to sign something you think is worthy of testing?  If the tests
fail, you'll need to rewind/amend/whatever to address the breakage. If
the tests pass, the commit is already signed and ready for
distribution. If you are spending a lot of time signing commits that
are highly likely to fail tests, well, maybe you should look at other
ways to improve your workflow so that you have a higher level of
confidence in the code you record and assume will be a permanent part
of the project's history.

> but you may need
> to commit it in order to *get* testing),

Maybe consider allowing a ".dirty" suffix like git-core does on
builds? Or if you are submitting the code to a remote test cluster
that auto-compiles the code for you (and that is why you need a
commit), it sounds like the time it takes for that to push, compile,
test, and report back is way higher than the time it takes to make the
signature. So you probably should only be submitting something that
you had a reasonable level of confidence in. So you should go ahead
and sign it before sending it for testing, in case the tests do pass
and you want to publish that commit.

> as well as from a
> "fundamental git datastructures" issue (ie you would want to sign
> commits that aren't yours.

Sure. But this is why you can make an empty commit and sign that.

> "git commit --amend" is not the answer - that destroys the fundamental
> concept of history being immutable, and while it works for your local
> commits, it doesn't work for anybody elses commits, or for stuff you
> already pushed out.

Nobody said you had to amend everything. You can add an empty commit.

> And "add a fake empty commit just for the signature" is not the answer
> either - because that is clearly inferior to the tags we already had.

Really? I disagree. The commit DAG scales quite well. The tag
namespace does not. A refs/signatures/$COMMIT_SHA1 namespace also does
not scale well.

An empty commit with a gpgsig header has about the same object cost as
an annotated tag once packed. But it has the advantage that the damn
thing doesn't clog up the reference space, the reference handling
code, or the advertisements in the native protocol. As history goes
on, older signatures are less relevant, and automatically are
avoided/skipped/bypassed by the normal DAG walking code. Tags don't do
this well because they have no relationship to the project history.

The only downside to an empty commit with the gpgsig header is I
cannot grab an arbitrarily deep ancestor and say "Who has signed a
commit that depends on this"? Today we already have this with git
describe --contains (aka git name-rev) for annotated tags. Its a new
feature we have to teach to some part of the log machinery, but the
algorithm will be easier because it doesn't have to mess with the
mapping table of tag objects. It just has to start digging from roots,
remembering each commit that has a gpgsig on any given branch path,
and then outputting the matches when it finds the commit in question.

The commit approach also has the advantage that your tree
automatically carries any lieutenant's signatures, by virtue of them
already being frozen in the commits.  This allows anyone downstream of
you to verify the same signatures, and check them against their own
keyring contents. If the signatures are all detached in some transient
annotated tag space, its impossible for anyone other than you to
verify pull requests. I would hate to say we have this nice
distributed version control system, but only Linus can prove the pull
requests in his repository are what they claim, and we have to then
implicitly trust you to resign that data without the original
signatures being present. $DAY_JOB would feel a lot better about the
integrity of the Linux kernel repository if _ANYONE_ can validate pull
requests offline after they have happened.

> I dunno. Did I miss something? As far as I can tell, the signed tags
> that we've had since day one are *clearly* much better in very
> fundamental ways.

Completely disagree. :-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox