git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* send-pack: limit on negative references
@ 2006-09-05 12:45 Andy Whitcroft
  2006-09-05 16:23 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-09-05 12:45 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

I've been having trouble with git push apparently resending the entire
commit trace for the branch each and every time I push; even with short
branch names.  This seems to be related to the changes made to handle
the case where the remote end has a large number of branches (>900).

The problem there seemed to be that we would fill list-revs' argument
list in reference order with negative and positive limits.  This could
lead to us to exceed the limit 900 argument limit and bomb.  The changes
made in commit 797656e58ddbd82ac461a5142ed726db3a4d0ac0 split the
parameter loading into two phases.  First we load the +ve and -ve
references for the branches we are definatly trying to send; bailing if
we cannot fit those references onto the command line.  Second we load up
the remaining remote references as -ve references to limit duplicate
object sends.

This second phase loads up to 16 additional references onto the command
line.  Now this effectively limits you to 16 branchs in your remote
repository if you wish to ensure you catch a parental head to copy from.
 This can lead us to send the entire commit stream for the branch even
though the branch may contain no new commits.

It seems to me that any and all negative references we have are helpful
and as such we should be loading as many as we can into the list.  If we
cannot fit them all we should _not_ error out, but we should not limit
ourselves to any less than as many as we can fit.

Can anyone see a downside to the patch below?

-apw

[-- Attachment #2: send-pack-supply-as-many-negative-references-as-we-can-fit --]
[-- Type: text/plain, Size: 1211 bytes --]

send-pack: supply as many negative references as we can fit

In commit 797656e58ddbd82ac461a5142ed726db3a4d0ac0 list-revs' command
line was reordered to ensure the branches we are pushing are first on
the list, and then a subset of the remaining remote references are
passed to limit our push.  Given that we now have the needed positive
references on the list first, it seems reasonable to load as many
negative references onto the list as will fit without error.

Signed-off-by: Andy Whitcroft <andyw@uk.ibm.com>
---
diff --git a/send-pack.c b/send-pack.c
index 10bc8bc..5d55241 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -40,7 +40,7 @@ static void exec_rev_list(struct ref *re
 {
 	struct ref *ref;
 	static const char *args[1000];
-	int i = 0, j;
+	int i = 0;
 
 	args[i++] = "rev-list";	/* 0 */
 	if (use_thin_pack)	/* 1 */
@@ -69,8 +69,8 @@ static void exec_rev_list(struct ref *re
 	/* Then a handful of the remainder
 	 * NEEDSWORK: we would be better off if used the newer ones first.
 	 */
-	for (ref = refs, j = i + 16;
-	     i < 900 && i < j && ref;
+	for (ref = refs;
+	     i < 900 && ref;
 	     ref = ref->next) {
 		if (is_zero_sha1(ref->new_sha1) &&
 		    !is_zero_sha1(ref->old_sha1) &&

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: send-pack: limit on negative references
  2006-09-05 12:45 send-pack: limit on negative references Andy Whitcroft
@ 2006-09-05 16:23 ` Junio C Hamano
  2006-09-05 21:32   ` Andy Whitcroft
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2006-09-05 16:23 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: git

Andy Whitcroft <apw@shadowen.org> writes:

> I've been having trouble with git push apparently resending the entire
> commit trace for the branch each and every time I push; even with short
> branch names.  This seems to be related to the changes made to handle
> the case where the remote end has a large number of branches (>900).

I think the right fix is to do one or both of the following, and
lift that 900 cut-off entirely.

One is to teach rev-list to read the information it is taking
from the command line instead from its standard input.

Another is to teach pack-object the same trick on top of my
patch last night.  This has an added benefit that we save one
pipe+fork+exec there.

These are essentially suggestions from Linus made twice
separately in the past, so they must be on the right track ;-).

If nobody does, I would do it myself, but the list is welcome to
beat me to it.  Especially, the former (giving --stdin option to
rev-list) should be trivial.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: send-pack: limit on negative references
  2006-09-05 16:23 ` Junio C Hamano
@ 2006-09-05 21:32   ` Andy Whitcroft
  2006-09-05 21:51     ` [PATCH 1/2] rev list add option accepting revision constraints on standard input Andy Whitcroft
  2006-09-05 21:52     ` [PATCH 2/2] send pack switch to using git rev list stdin Andy Whitcroft
  0 siblings, 2 replies; 8+ messages in thread
From: Andy Whitcroft @ 2006-09-05 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Andy Whitcroft <apw@shadowen.org> writes:
> 
>> I've been having trouble with git push apparently resending the entire
>> commit trace for the branch each and every time I push; even with short
>> branch names.  This seems to be related to the changes made to handle
>> the case where the remote end has a large number of branches (>900).
> 
> I think the right fix is to do one or both of the following, and
> lift that 900 cut-off entirely.
> 
> One is to teach rev-list to read the information it is taking
> from the command line instead from its standard input.
> 
> Another is to teach pack-object the same trick on top of my
> patch last night.  This has an added benefit that we save one
> pipe+fork+exec there.
> 
> These are essentially suggestions from Linus made twice
> separately in the past, so they must be on the right track ;-).
> 
> If nobody does, I would do it myself, but the list is welcome to
> beat me to it.  Especially, the former (giving --stdin option to
> rev-list) should be trivial.

Ok, I'll have a go at this one.  As you say it looks pretty simple to
reuse the command line revision parser on an incoming stream.

-apw

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] rev list add option accepting revision constraints on standard input
  2006-09-05 21:32   ` Andy Whitcroft
@ 2006-09-05 21:51     ` Andy Whitcroft
  2006-09-05 22:10       ` Junio C Hamano
  2006-09-05 21:52     ` [PATCH 2/2] send pack switch to using git rev list stdin Andy Whitcroft
  1 sibling, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-09-05 21:51 UTC (permalink / raw)
  To: git

rev-list: add option accepting revision constraints on standard input

When we are generating packs to update remote repositories we
want to supply as much information as possible about the revisions
that already exist to rev-list in order optimise the pack as much
as possible.  We need to pass two revisions for each branch we are
updating in the remote repository and one for each additional branch.
Where the remote repository has numerous branches we can run out
of command line space to pass them.

Add a --stdin flag which causes rev-list to additionally read
its stdin stream and parse that for revision constraints.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 8437454..0303909 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -304,10 +304,30 @@ static void mark_edges_uninteresting(str
 	}
 }
 
+/*
+ * Parse revision information, filling in the "rev_info" structure,
+ * revisions are taken from stream.
+ */
+static void setup_revisions_stream(FILE *stream, struct rev_info *revs)
+{
+	char line[1000];
+	const char *args[] = { 0, line, 0 };
+
+	while (fgets(line, sizeof(line), stream) != NULL) {
+		line[strlen(line) - 1] = 0;
+
+		if (line[0] == '-')
+			die("options not supported in --stdin mode");
+
+		(void)setup_revisions(2, args, revs, NULL);
+	}
+}
+
 int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
 	struct commit_list *list;
 	int i;
+	int read_stdin = 0;
 
 	init_revisions(&revs, prefix);
 	revs.abbrev = 0;
@@ -329,9 +349,15 @@ int cmd_rev_list(int argc, const char **
 			bisect_list = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--stdin")) {
+			read_stdin = 1;
+			continue;
+		}
 		usage(rev_list_usage);
-
 	}
+	if (read_stdin)
+		setup_revisions_stream(stdin, &revs);
+
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
 		/* The command line has a --pretty  */
 		hdr_termination = '\n';

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] send pack switch to using git rev list stdin
  2006-09-05 21:32   ` Andy Whitcroft
  2006-09-05 21:51     ` [PATCH 1/2] rev list add option accepting revision constraints on standard input Andy Whitcroft
@ 2006-09-05 21:52     ` Andy Whitcroft
  1 sibling, 0 replies; 8+ messages in thread
From: Andy Whitcroft @ 2006-09-05 21:52 UTC (permalink / raw)
  To: git

send-pack: switch to using git-rev-list --stdin

When we are generating packs to update remote repositories we
want to supply as much information as possible about the revisions
that already exist to rev-list in order optimise the pack as much
as possible.  We need to pass two revisions for each branch we are
updating in the remote repository and one for each additional branch.
Where the remote repository has numerous branches we can run out
of command line space to pass them.

Utilise the git-rev-list --stdin mode to allow unlimited numbers
of revision constraints.  This allows us to move back to the much
simpler unordered revision selection code.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/send-pack.c b/send-pack.c
index ac4501d..f6e5eed 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -38,9 +38,8 @@ static void exec_pack_objects(void)
 
 static void exec_rev_list(struct ref *refs)
 {
-	struct ref *ref;
-	static const char *args[1000];
-	int i = 0, j;
+	static const char *args[4];
+	int i = 0;
 
 	args[i++] = "rev-list";	/* 0 */
 	if (use_thin_pack)	/* 1 */
@@ -48,43 +47,29 @@ static void exec_rev_list(struct ref *re
 	else
 		args[i++] = "--objects";
 
-	/* First send the ones we care about most */
-	for (ref = refs; ref; ref = ref->next) {
-		if (900 < i)
-			die("git-rev-list environment overflow");
-		if (!is_zero_sha1(ref->new_sha1)) {
-			char *buf = xmalloc(100);
-			args[i++] = buf;
-			snprintf(buf, 50, "%s", sha1_to_hex(ref->new_sha1));
-			buf += 50;
-			if (!is_zero_sha1(ref->old_sha1) &&
-			    has_sha1_file(ref->old_sha1)) {
-				args[i++] = buf;
-				snprintf(buf, 50, "^%s",
-					 sha1_to_hex(ref->old_sha1));
-			}
-		}
-	}
+	args[i++] = "--stdin";
 
-	/* Then a handful of the remainder
-	 * NEEDSWORK: we would be better off if used the newer ones first.
-	 */
-	for (ref = refs, j = i + 16;
-	     i < 900 && i < j && ref;
-	     ref = ref->next) {
-		if (is_zero_sha1(ref->new_sha1) &&
-		    !is_zero_sha1(ref->old_sha1) &&
-		    has_sha1_file(ref->old_sha1)) {
-			char *buf = xmalloc(42);
-			args[i++] = buf;
-			snprintf(buf, 42, "^%s", sha1_to_hex(ref->old_sha1));
-		}
-	}
 	args[i] = NULL;
 	execv_git_cmd(args);
 	die("git-rev-list exec failed (%s)", strerror(errno));
 }
 
+static void builtin_rev_list_generate(struct ref *refs)
+{
+        while (refs) {
+                if (!is_zero_sha1(refs->old_sha1) &&
+                    has_sha1_file(refs->old_sha1)) {
+                        printf("^%s", sha1_to_hex(refs->old_sha1));
+                }
+                if (!is_zero_sha1(refs->new_sha1)) {
+                        printf("%s", sha1_to_hex(refs->new_sha1));
+                }
+                refs = refs->next;
+        }
+
+	exit(0);
+}
+
 static void rev_list(int fd, struct ref *refs)
 {
 	int pipe_fd[2];
@@ -111,13 +96,38 @@ static void rev_list(int fd, struct ref 
 	exec_rev_list(refs);
 }
 
+static void rev_list_generate(int fd, struct ref *refs)
+{
+	int pipe_fd[2];
+	pid_t rev_list_generate_pid;
+
+	if (pipe(pipe_fd) < 0)
+		die("rev-list-generate setup: pipe failed");
+	rev_list_generate_pid = fork();
+	if (!rev_list_generate_pid) {
+		dup2(pipe_fd[0], 0);
+		dup2(fd, 1);
+		close(pipe_fd[0]);
+		close(pipe_fd[1]);
+		close(fd);
+		rev_list(fd, refs);
+		die("rev-list setup failed");
+	}
+	if (rev_list_generate_pid < 0)
+		die("rev-list-generate fork failed");
+	dup2(pipe_fd[1], 1);
+	close(pipe_fd[0]);
+	close(pipe_fd[1]);
+	close(fd);
+	builtin_rev_list_generate(refs);
+}
 static void pack_objects(int fd, struct ref *refs)
 {
 	pid_t rev_list_pid;
 
 	rev_list_pid = fork();
 	if (!rev_list_pid) {
-		rev_list(fd, refs);
+		rev_list_generate(fd, refs);
 		die("rev-list setup failed");
 	}
 	if (rev_list_pid < 0)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] rev list add option accepting revision constraints on standard input
  2006-09-05 21:51     ` [PATCH 1/2] rev list add option accepting revision constraints on standard input Andy Whitcroft
@ 2006-09-05 22:10       ` Junio C Hamano
  2006-09-06  1:01         ` Andy Whitcroft
  2006-09-06  4:46         ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2006-09-05 22:10 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: git

Andy Whitcroft <apw@shadowen.org> writes:

> Add a --stdin flag which causes rev-list to additionally read
> its stdin stream and parse that for revision constraints.

> +/*
> + * Parse revision information, filling in the "rev_info" structure,
> + * revisions are taken from stream.
> + */
> +static void setup_revisions_stream(FILE *stream, struct rev_info *revs)
> +{
> +	char line[1000];
> +	const char *args[] = { 0, line, 0 };
> +
> +	while (fgets(line, sizeof(line), stream) != NULL) {
> +		line[strlen(line) - 1] = 0;
> +
> +		if (line[0] == '-')
> +			die("options not supported in --stdin mode");
> +
> +		(void)setup_revisions(2, args, revs, NULL);
> +	}
> +}

Is calling setup_revisions() on the same revs like this many
times safe?  I do not think so, especially what is after the
primary "for()" loop in the function.

I was sort-of expecting that you would instead replace that
primary for() loop in setup_revisions() with some sort of
callback...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] rev list add option accepting revision constraints on standard input
  2006-09-05 22:10       ` Junio C Hamano
@ 2006-09-06  1:01         ` Andy Whitcroft
  2006-09-06  4:46         ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Andy Whitcroft @ 2006-09-06  1:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Andy Whitcroft <apw@shadowen.org> writes:
> 
>> Add a --stdin flag which causes rev-list to additionally read
>> its stdin stream and parse that for revision constraints.
> 
>> +/*
>> + * Parse revision information, filling in the "rev_info" structure,
>> + * revisions are taken from stream.
>> + */
>> +static void setup_revisions_stream(FILE *stream, struct rev_info *revs)
>> +{
>> +	char line[1000];
>> +	const char *args[] = { 0, line, 0 };
>> +
>> +	while (fgets(line, sizeof(line), stream) != NULL) {
>> +		line[strlen(line) - 1] = 0;
>> +
>> +		if (line[0] == '-')
>> +			die("options not supported in --stdin mode");
>> +
>> +		(void)setup_revisions(2, args, revs, NULL);
>> +	}
>> +}
> 
> Is calling setup_revisions() on the same revs like this many
> times safe?  I do not think so, especially what is after the
> primary "for()" loop in the function.
> 
> I was sort-of expecting that you would instead replace that
> primary for() loop in setup_revisions() with some sort of
> callback...

Heh, well I'll give it another poke tommorrow my time ...

Thanks for the feedback.  Its hard to get any context on something new
without trying and failing :).

-apw

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] rev list add option accepting revision constraints on standard input
  2006-09-05 22:10       ` Junio C Hamano
  2006-09-06  1:01         ` Andy Whitcroft
@ 2006-09-06  4:46         ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2006-09-06  4:46 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> Is calling setup_revisions() on the same revs like this many
> times safe?  I do not think so, especially what is after the
> primary "for()" loop in the function.
>
> I was sort-of expecting that you would instead replace that
> primary for() loop in setup_revisions() with some sort of
> callback...

I take half of the above back.  Even after setup_revisions()
returns, adding more revisions via add_pending_object() is
safe.  However, the postprocessing done in setup_revisions()
after its main loop, while I do not think they would crash when
called twice, would be very wasteful.

And ``callback'' interface is usually very cumbersome to use, so
we probably would want to avoid it unless absolutely necessary.

I've outlined an alternative implementation in two patches;
I'll be sending them out shortly.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-09-06  4:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-05 12:45 send-pack: limit on negative references Andy Whitcroft
2006-09-05 16:23 ` Junio C Hamano
2006-09-05 21:32   ` Andy Whitcroft
2006-09-05 21:51     ` [PATCH 1/2] rev list add option accepting revision constraints on standard input Andy Whitcroft
2006-09-05 22:10       ` Junio C Hamano
2006-09-06  1:01         ` Andy Whitcroft
2006-09-06  4:46         ` Junio C Hamano
2006-09-05 21:52     ` [PATCH 2/2] send pack switch to using git rev list stdin Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).