git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git cannot push to repository with too many tags/heads
@ 2006-02-21 18:46 Stephen C. Tweedie
  2006-02-21 18:48 ` [PATCH] Don't sent objects for refs we're not going to update Stephen C. Tweedie
  2006-02-22  1:59 ` Git cannot push to repository with too many tags/heads Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 2006-02-21 18:46 UTC (permalink / raw)
  To: git mailing list; +Cc: Stephen Tweedie

Hi,

I'm using git's cvs import to track the Fedora kernel rpm repository,
and it has just passed a watermark that git cannot currently cope with.

The problem is the large number of branches and tags on this repository.
A tag for every build means that the number builds up rapidly, and
git-send-pack has an internal limit of 900 refs to query when building a
pack.

I hoped this wouldn't be a problem.  In theory, I should just be able to
push to a single refspec and avoid that limit entirely.  But no ---
unfortunately, git is determined to push out the objects for *every* ref
that matches between src and dst when building the pack, even if it's
not actually going to update the remote ref.  And in building the
git-rev-list argument list for that colossal update, it exceeds the
internal limit of 900 refs in exec_rev_list().

I think exec_rev_list() is doing the wrong thing here.  If I specify an
explicit refspec list for git-send-pack, then when send_pack() calls
match_refs(), we break out to match_explicit_refs() and set
dst->peer_ref only for the ref[s] which match the refspec.  send_pack()
then skips all other refs by doing a

		if (!ref->peer_ref)
			continue;

Unfortunately, exec_rev_list() is missing this, and it tries to ask
git-rev-list for the commit objects of *every* ref on the remote_refs
list, even if they have been explicitly excluded by match_refs() and
have no peer_ref.  So with this huge repository, I can't even push a
single refspec without bumping into the limit of 900 refs.

The immediate consequence to me is that I can't push to this particular
repository.  But if I'm following the code right, a consequence is that
git-send-pack is accidentally sending all objects for refs that the
remote end is prepared to receive, even if we've supplied a refspec
asking for only a subset of those refs to actually be updated.

I think we can fix this just by adding a test for ref->peer_ref to
exec_rev_list().  That seems to fix the immediate problem for me, at
least.  Patch to follow.

--Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] Don't sent objects for refs we're not going to update.
  2006-02-21 18:46 Git cannot push to repository with too many tags/heads Stephen C. Tweedie
@ 2006-02-21 18:48 ` Stephen C. Tweedie
  2006-02-22  1:59 ` Git cannot push to repository with too many tags/heads Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 2006-02-21 18:48 UTC (permalink / raw)
  To: git mailing list; +Cc: Stephen Tweedie

send_pack() sends only those refs we've asked to be updated on the
destination---either via an explicit refspec or by matching local and
remote refs.  But rev_list() builds an object list for *all* refs it
can find.  For a tree with many tags/heads, this means that it is
impossible to push updates even to a single refspec, as exec_rev_list()
overflows its arg length limit.

Fix this by skipping refs with no peer_ref set in exec_rev_list().
send_pack() already skips it when sending refs; we need to skip it
when building the object list for the pack too.

Signed-off-by: Stephen Tweedie <sct@redhat.com>

---

 send-pack.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

4aa39c0920eea37987ca8a6b10861da7a87b5c14
diff --git a/send-pack.c b/send-pack.c
index 990be3f..d2a39d9 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -42,8 +42,10 @@ static void exec_rev_list(struct ref *re
 
 	args[i++] = "rev-list";	/* 0 */
 	args[i++] = "--objects";	/* 1 */
-	while (refs) {
+	for (; refs; refs = refs->next) {
 		char *buf = malloc(100);
+		if (!refs->peer_ref)
+			continue;
 		if (i > 900)
 			die("git-rev-list environment overflow");
 		if (!is_zero_sha1(refs->old_sha1) &&
@@ -56,7 +58,6 @@ static void exec_rev_list(struct ref *re
 			args[i++] = buf;
 			snprintf(buf, 50, "%s", sha1_to_hex(refs->new_sha1));
 		}
-		refs = refs->next;
 	}
 	args[i] = NULL;
 	execv_git_cmd(args);
-- 
1.2.2.g6643-dirty

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Git cannot push to repository with too many tags/heads
  2006-02-21 18:46 Git cannot push to repository with too many tags/heads Stephen C. Tweedie
  2006-02-21 18:48 ` [PATCH] Don't sent objects for refs we're not going to update Stephen C. Tweedie
@ 2006-02-22  1:59 ` Junio C Hamano
  2006-02-22  2:56   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2006-02-22  1:59 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git

"Stephen C. Tweedie" <sct@redhat.com> writes:

> send_pack()
> then skips all other refs by doing a
>
> 		if (!ref->peer_ref)
> 			continue;
>
> Unfortunately, exec_rev_list() is missing this, and it tries to ask
> git-rev-list for the commit objects of *every* ref on the remote_refs
> list, even if they have been explicitly excluded by match_refs() and
> have no peer_ref.  So with this huge repository, I can't even push a
> single refspec without bumping into the limit of 900 refs.

IIRC, the distinction was deliberate.  send_pack() excludes
what did not match because it does not want to send them.
rev_list() adds what we know they have to "do not bother to
send" list to make the resulting pack smaller.  The time where
it matters most is when you are pushing a new branch head (or a
tag).

I think the exec_rev_list logic should be taught to first
include all the positive refs (i.e. the ones we are going to
send), and then as many the negative refs (i.e. the ones we know
they have), from newer to older, as they fit without triggering
"argument list too long".

Another, probably conceptually cleaner alternative might be to
allow rev-list to read from its stdin so that we do not have to
worry about the argument list issues.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git cannot push to repository with too many tags/heads
  2006-02-22  1:59 ` Git cannot push to repository with too many tags/heads Junio C Hamano
@ 2006-02-22  2:56   ` Junio C Hamano
  2006-02-22  9:51     ` [PATCH] send-pack: do not give up when remote has insanely large number of refs Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2006-02-22  2:56 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> "Stephen C. Tweedie" <sct@redhat.com> writes:
>
>> send_pack()
>> then skips all other refs by doing a
>>
>> 		if (!ref->peer_ref)
>> 			continue;
>>
>> Unfortunately, exec_rev_list() is missing this, and it tries to ask
>> git-rev-list for the commit objects of *every* ref on the remote_refs
>> list, even if they have been explicitly excluded by match_refs() and
>> have no peer_ref.  So with this huge repository, I can't even push a
>> single refspec without bumping into the limit of 900 refs.
>
> IIRC, the distinction was deliberate.  send_pack() excludes
> what did not match because it does not want to send them.
> rev_list() adds what we know they have to "do not bother to
> send" list to make the resulting pack smaller.  The time where
> it matters most is when you are pushing a new branch head (or a
> tag).
>
> I think the exec_rev_list logic should be taught to first
> include all the positive refs (i.e. the ones we are going to
> send), and then as many the negative refs (i.e. the ones we know
> they have), from newer to older, as they fit without triggering
> "argument list too long".

That is, something like this.
-- >8 --
Do not give up running rev-list when remote has insanely large number of refs.

---
cd /opt/packrat/playpen/public/in-place/git/git.junio/
git diff
diff --git a/send-pack.c b/send-pack.c
index 990be3f..cfd0eeb 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -37,24 +37,27 @@ static void exec_pack_objects(void)
 
 static void exec_rev_list(struct ref *refs)
 {
+	struct ref *ref;
 	static char *args[1000];
 	int i = 0;
 
 	args[i++] = "rev-list";	/* 0 */
 	args[i++] = "--objects";	/* 1 */
-	while (refs) {
-		char *buf = malloc(100);
+	for (ref = refs; refs; refs = refs->next) {
+		char *buf = malloc(41);
 		if (i > 900)
 			die("git-rev-list environment overflow");
-		if (!is_zero_sha1(refs->old_sha1) &&
-		    has_sha1_file(refs->old_sha1)) {
+		if (!is_zero_sha1(refs->new_sha1)) {
 			args[i++] = buf;
-			snprintf(buf, 50, "^%s", sha1_to_hex(refs->old_sha1));
-			buf += 50;
+			snprintf(buf, 41, "%s", sha1_to_hex(refs->new_sha1));
 		}
-		if (!is_zero_sha1(refs->new_sha1)) {
+	}
+	for (ref = refs; i < 900 && refs; refs = refs->next) {
+		char *buf = malloc(42);
+		if (!is_zero_sha1(refs->old_sha1) &&
+		    has_sha1_file(refs->old_sha1)) {
 			args[i++] = buf;
-			snprintf(buf, 50, "%s", sha1_to_hex(refs->new_sha1));
+			snprintf(buf, 42, "^%s", sha1_to_hex(refs->old_sha1));
 		}
 		refs = refs->next;
 	}

Compilation finished at Tue Feb 21 18:52:49

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] send-pack: do not give up when remote has insanely large number of refs.
  2006-02-22  2:56   ` Junio C Hamano
@ 2006-02-22  9:51     ` Junio C Hamano
  2006-02-22 18:30       ` Stephen C. Tweedie
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2006-02-22  9:51 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git

Stephen C. Tweedie noticed that we give up running rev-list when
we see too many refs on the remote side.  Limit the number of
negative references we give to rev-list and continue.

Not sending any negative references to rev-list is very bad --
we may be pushing a ref that is new to the other end.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

  Junio C Hamano <junkio@cox.net> writes:

  > That is, something like this.
  > -- >8 --
  > Do not give up...

  Ah, crap.  Sorry about that.  This one has been tested at least.

 send-pack.c |   38 ++++++++++++++++++++++++++++----------
 1 files changed, 28 insertions(+), 10 deletions(-)

797656e58ddbd82ac461a5142ed726db3a4d0ac0
diff --git a/send-pack.c b/send-pack.c
index 990be3f..b58bbab 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -37,26 +37,44 @@ static void exec_pack_objects(void)
 
 static void exec_rev_list(struct ref *refs)
 {
+	struct ref *ref;
 	static char *args[1000];
-	int i = 0;
+	int i = 0, j;
 
 	args[i++] = "rev-list";	/* 0 */
 	args[i++] = "--objects";	/* 1 */
-	while (refs) {
-		char *buf = malloc(100);
-		if (i > 900)
+
+	/* First send the ones we care about most */
+	for (ref = refs; ref; ref = ref->next) {
+		if (900 < i)
 			die("git-rev-list environment overflow");
-		if (!is_zero_sha1(refs->old_sha1) &&
-		    has_sha1_file(refs->old_sha1)) {
+		if (!is_zero_sha1(ref->new_sha1)) {
+			char *buf = malloc(100);
 			args[i++] = buf;
-			snprintf(buf, 50, "^%s", sha1_to_hex(refs->old_sha1));
+			snprintf(buf, 50, "%s", sha1_to_hex(ref->new_sha1));
 			buf += 50;
+			if (!is_zero_sha1(ref->old_sha1) &&
+			    has_sha1_file(ref->old_sha1)) {
+				args[i++] = buf;
+				snprintf(buf, 50, "^%s",
+					 sha1_to_hex(ref->old_sha1));
+			}
 		}
-		if (!is_zero_sha1(refs->new_sha1)) {
+	}
+
+	/* Then a handful of the remainder
+	 * NEEDSWORK: we would be better off if used the newer ones first.
+	 */
+	for (ref = refs, j = i + 16;
+	     i < 900 && i < j && ref;
+	     ref = ref->next) {
+		if (is_zero_sha1(ref->new_sha1) &&
+		    !is_zero_sha1(ref->old_sha1) &&
+		    has_sha1_file(ref->old_sha1)) {
+			char *buf = malloc(42);
 			args[i++] = buf;
-			snprintf(buf, 50, "%s", sha1_to_hex(refs->new_sha1));
+			snprintf(buf, 42, "^%s", sha1_to_hex(ref->old_sha1));
 		}
-		refs = refs->next;
 	}
 	args[i] = NULL;
 	execv_git_cmd(args);
-- 
1.2.2.g882e

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] send-pack: do not give up when remote has insanely large number of refs.
  2006-02-22  9:51     ` [PATCH] send-pack: do not give up when remote has insanely large number of refs Junio C Hamano
@ 2006-02-22 18:30       ` Stephen C. Tweedie
  2006-02-22 18:42         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen C. Tweedie @ 2006-02-22 18:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Wed, 2006-02-22 at 01:51 -0800, Junio C Hamano wrote:

> +	for (ref = refs, j = i + 16;
> +	     i < 900 && i < j && ref;

Looks like it's now sending just 16 additional negative refs instead of
940 for this repo.  Definitely an improvement --- push (both full and
with an explicit refspec) is now working properly, thanks!  

Adding more ^refs up to the limit of 900 should be possible, too, and
should catch more already-present objects --- while the refs count for
this repo was under 900, push still worked fine for me, so we don't
necessarily have to cut it hard to as low a number as 16.

Perhaps ultimately we may want to simply send the refs list to
git-rev-list via a pipe or similar if we want this to scale?  We'll need
this for edge cases such as sending >900 new tags to an old repository
at once, as we'll exhaust the size of the positive refs list in that
case.

Thanks again,
 Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] send-pack: do not give up when remote has insanely large number of refs.
  2006-02-22 18:30       ` Stephen C. Tweedie
@ 2006-02-22 18:42         ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2006-02-22 18:42 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git

"Stephen C. Tweedie" <sct@redhat.com> writes:

> Adding more ^refs up to the limit of 900 should be possible, too, and
> should catch more already-present objects --- while the refs count for
> this repo was under 900, push still worked fine for me, so we don't
> necessarily have to cut it hard to as low a number as 16.

Actually it was done on purpose.  My first round did not have
that 16 limit, and then it turned out that rev-list, given extra
work to cull those 900 negative refs, were spending a lot of
time.  I created a repository with 1500 tags for this test ;-).

Unless we were talking about a repository that happens to host
1000 unrelated projects, each tracked with separate sets of
heads and tags, those negative refs should be related with each
other in the ancestry graph.  We should be able to eliminate
most of them by using the latest handful.

> Perhaps ultimately we may want to simply send the refs list to
> git-rev-list via a pipe or similar if we want this to scale?  We'll need
> this for edge cases such as sending >900 new tags to an old repository
> at once, as we'll exhaust the size of the positive refs list in that
> case.

True.  We should support unbounded number of positive refs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-02-22 18:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-21 18:46 Git cannot push to repository with too many tags/heads Stephen C. Tweedie
2006-02-21 18:48 ` [PATCH] Don't sent objects for refs we're not going to update Stephen C. Tweedie
2006-02-22  1:59 ` Git cannot push to repository with too many tags/heads Junio C Hamano
2006-02-22  2:56   ` Junio C Hamano
2006-02-22  9:51     ` [PATCH] send-pack: do not give up when remote has insanely large number of refs Junio C Hamano
2006-02-22 18:30       ` Stephen C. Tweedie
2006-02-22 18:42         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).