git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland.net>
To: Johannes Sixt <j.sixt@viscovery.net>, Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Peter Krefting <peter@softwolves.pp.se>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Alex Riesen <raa.lkml@gmail.com>
Subject: [PATCH v3] quickfetch(): Prevent overflow of the rev-list command line
Date: Thu, 09 Jul 2009 15:52:44 +0200	[thread overview]
Message-ID: <200907091552.44545.johan@herland.net> (raw)
In-Reply-To: <4A55E100.9010700@viscovery.net>

quickfetch() calls rev-list to check whether the objects we are about to
fetch are already present in the repo (if so, we can skip the object fetch).
However, when there are many (~1000) refs to be fetched, the rev-list
command line grows larger than the maximum command line size on some systems
(32K in Windows). This causes rev-list to fail, making quickfetch() return
non-zero, which unnecessarily triggers the transport machinery. This somehow
causes fetch to fail with an exit code.

By using the --stdin option to rev-list (and feeding the object list to its
standard input), we prevent the overflow of the rev-list command line,
which causes quickfetch(), and subsequently the overall fetch, to succeed.

However, using rev-list --stdin is not entirely straightforward: rev-list
terminates immediately when encountering an unknown object, which can
trigger SIGPIPE if we are still writing object's to its standard input.
We therefore temporarily ignore SIGPIPE so that the fetch process is not
terminated.

The patch also contains a testcase to verify the fix (note that before
the patch, the testcase would only fail on msysGit).

Signed-off-by: Johan Herland <johan@herland.net>
Improved-by: Johannes Sixt <j6t@kdbg.org>
Improved-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Peter Krefting <peter@softwolves.pp.se>
---

On Thursday 09 July 2009, Johannes Sixt wrote:
> Would you please add such a test (perhaps in t5502)? It
> would also help me verify the patch works as intended on Windows.

Done (although somewhat naively). I don't have an msysgit setup to test
this, but faking the failure condition in quickfetch() (return -1 if
#refs > 800) does trigger the selftest (the second git fetch fails).

I could add a separate pre-patch introducing the selftest with
test_expect_failure, but that would only confuse non-msysgit users
where the test succeeds both before and after the fix.

> Please make this <j6t@kdbg.org> despite the email address I'm using right
> now.

Ok.

> The call site of quickfetch() is not interested in the errno, only on
> whether the return value is non-zero: You can just assign -1 to err
> (that's our convention for failure). OTOH, it would be helpful to include
> strerror(errno) in the error message.

Fixed.

> Shouldn't you reset signal(SIGPIPE) to its previous value?

Done (provided that the sigchain_push/pop infrastructure works the way
I expect).

Thanks a lot for your review and suggestions.


Have fun! :)

...Johan


 builtin-fetch.c       |   65 ++++++++++++++++++++++++++++--------------------
 t/t5502-quickfetch.sh |   20 +++++++++++++++
 2 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/builtin-fetch.c b/builtin-fetch.c
index cd5eb9a..2e3c609 100644
--- a/builtin-fetch.c
+++ b/builtin-fetch.c
@@ -400,14 +400,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 /*
  * We would want to bypass the object transfer altogether if
- * everything we are going to fetch already exists and connected
+ * everything we are going to fetch already exists and is connected
  * locally.
  *
- * The refs we are going to fetch are in to_fetch (nr_heads in
- * total).  If running
+ * The refs we are going to fetch are in ref_map.  If running
  *
- *  $ git rev-list --objects to_fetch[0] to_fetch[1] ... --not --all
+ *  $ git rev-list --objects --stdin --not --all
  *
+ * (feeding all the refs in ref_map on its standard input)
  * does not error out, that means everything reachable from the
  * refs we are going to fetch exists and is connected to some of
  * our existing refs.
@@ -416,8 +416,9 @@ static int quickfetch(struct ref *ref_map)
 {
 	struct child_process revlist;
 	struct ref *ref;
-	char **argv;
-	int i, err;
+	int err;
+	const char *argv[] = {"rev-list",
+		"--quiet", "--objects", "--stdin", "--not", "--all", NULL};
 
 	/*
 	 * If we are deepening a shallow clone we already have these
@@ -429,34 +430,44 @@ static int quickfetch(struct ref *ref_map)
 	if (depth)
 		return -1;
 
-	for (i = 0, ref = ref_map; ref; ref = ref->next)
-		i++;
-	if (!i)
+	if (!ref_map)
 		return 0;
 
-	argv = xmalloc(sizeof(*argv) * (i + 6));
-	i = 0;
-	argv[i++] = xstrdup("rev-list");
-	argv[i++] = xstrdup("--quiet");
-	argv[i++] = xstrdup("--objects");
-	for (ref = ref_map; ref; ref = ref->next)
-		argv[i++] = xstrdup(sha1_to_hex(ref->old_sha1));
-	argv[i++] = xstrdup("--not");
-	argv[i++] = xstrdup("--all");
-	argv[i++] = NULL;
-
 	memset(&revlist, 0, sizeof(revlist));
-	revlist.argv = (const char**)argv;
+	revlist.argv = argv;
 	revlist.git_cmd = 1;
-	revlist.no_stdin = 1;
 	revlist.no_stdout = 1;
 	revlist.no_stderr = 1;
-	err = run_command(&revlist);
+	revlist.in = -1;
+
+	/* If rev-list --stdin encounters an unknown commit, it terminates,
+	 * which will cause SIGPIPE in the write loop below. */
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = start_command(&revlist);
+	if (err) {
+		error("could not run rev-list");
+		return err;
+	}
+
+	for (ref = ref_map; ref; ref = ref->next) {
+		if (write_in_full(revlist.in, sha1_to_hex(ref->old_sha1), 40) < 0 ||
+		    write_in_full(revlist.in, "\n", 1) < 0) {
+			if (err != EPIPE && err != EINVAL)
+				error("failed write to rev-list: %s", strerror(errno));
+			err = -1;
+			break;
+		}
+	}
+
+	if (close(revlist.in)) {
+		error("failed to close rev-list's stdin: %s", strerror(errno));
+		err = -1;
+	}
+
+	sigchain_pop(SIGPIPE);
 
-	for (i = 0; argv[i]; i++)
-		free(argv[i]);
-	free(argv);
-	return err;
+	return finish_command(&revlist) || err;
 }
 
 static int fetch_refs(struct transport *transport, struct ref *ref_map)
diff --git a/t/t5502-quickfetch.sh b/t/t5502-quickfetch.sh
index 16eadd6..1037a72 100755
--- a/t/t5502-quickfetch.sh
+++ b/t/t5502-quickfetch.sh
@@ -119,4 +119,24 @@ test_expect_success 'quickfetch should not copy from alternate' '
 
 '
 
+test_expect_success 'quickfetch should handle ~1000 refs (on Windows)' '
+
+	git gc &&
+	head=$(git rev-parse HEAD) &&
+	branchprefix="$head refs/heads/branch" &&
+	for i in 0 1 2 3 4 5 6 7 8 9; do
+		for j in 0 1 2 3 4 5 6 7 8 9; do
+			for k in 0 1 2 3 4 5 6 7 8 9; do
+				echo "$branchprefix$i$j$k" >> .git/packed-refs
+			done
+		done
+	done &&
+	(
+		cd cloned &&
+		git fetch &&
+		git fetch
+	)
+
+'
+
 test_done
-- 
1.6.3.rc0.1.gf800

  reply	other threads:[~2009-07-09 13:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-18 13:10 msysgit git-submodule: "Unable to fetch in submodule path ..." Peter Krefting
2009-06-22 12:46 ` Peter Krefting
2009-07-08 13:58   ` [PATCH] quickfetch(): Prevent overflow of the rev-list command line Johan Herland
2009-07-08 15:12     ` Johannes Sixt
2009-07-08 16:01       ` Johan Herland
2009-07-08 17:22         ` Junio C Hamano
2009-07-09  8:43           ` Johan Herland
2009-07-09  8:49             ` Alex Riesen
2009-07-09  8:51             ` Johannes Sixt
2009-07-09  9:07               ` Johan Herland
2009-07-09  9:15                 ` Johannes Sixt
2009-07-09  9:34                   ` Johan Herland
2009-07-09 12:22                     ` Johannes Sixt
2009-07-09 13:52                       ` Johan Herland [this message]
2009-07-09 14:21                         ` [PATCH v3] " Johannes Sixt
2009-07-09 14:32                           ` Jeff King
2009-07-09 14:49                             ` [PATCH v4] " Johan Herland
2009-07-09 16:20                               ` Johannes Sixt
2009-07-09 23:52                                 ` [PATCH v5] " Johan Herland
2009-07-11  6:55                                   ` Junio C Hamano
2009-07-11 10:58                                     ` Johan Herland
2009-07-09 14:42                           ` [PATCH v3] " Johan Herland
2009-07-09 14:56                             ` Johannes Sixt
2009-07-09 15:32                               ` Johan Herland
2009-07-09 16:14                                 ` Johannes Sixt
2009-07-09  8:01         ` [PATCH] " Alex Riesen
2009-07-09  8:37           ` Johan Herland
2009-07-09  8:43             ` Alex Riesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200907091552.44545.johan@herland.net \
    --to=johan@herland.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j.sixt@viscovery.net \
    --cc=peter@softwolves.pp.se \
    --cc=raa.lkml@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).