git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Todoroski <grnch@gmx.net>
To: Jeff King <peff@peff.net>
Cc: Shawn Pearce <spearce@spearce.org>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Jakub Narebski <jnareb@gmail.com>,
	git@vger.kernel.org
Subject: [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin
Date: Tue, 27 Mar 2012 08:25:25 +0200	[thread overview]
Message-ID: <4F715D55.8020109@gmx.net> (raw)
In-Reply-To: <4F715CF7.5070903@gmx.net>

If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.

This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.

Here is an easy way to trigger this problem:

git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
for ((i=0; i<50000; i++)); do
	echo $sha refs/tags/artificially-long-tag-name-to-more-easily-\
demonstrate-the-problem-$i >> .git/packed-refs
done

Then share this repo over the smart HTTP protocol and try cloning it:

	$ git clone http://localhost/.../too-many-refs/.git
	Cloning into 'too-many-refs'...
	fatal: cannot exec 'fetch-pack': Argument list too long

50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.

This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.

Four possible solutions were discussed on the Git mailing list (in no
particular order):

1) Call fetch-pack multiple times with smaller batches of refs.

This was dismissed as inefficient and inelegant.

2) Add option --refs-fd=$n to pass a an fd from where to read the refs.

This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.

3) Add option --refs-from=$tmpfile to pass the refs using a temp file.

This was not favored because of the temp file requirement.

4) Add option --stdin to pass the refs on stdin, one per line.

In the end this option was chosen as the most efficient and most
desirable from scripting perspective.

There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.

If we were going to sneak refs on stdin line by line in the presence of
--stateless-rpc it would have to be done very carefully, because when
reading refs one by line we might buffer too much data ahead and eat
some of the remote protocol data also coming on stdin.

One way to solve this would be refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.

In the end we settled on the following solution:

If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.

However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.

This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.
---
 Documentation/git-fetch-pack.txt |   10 +++++++++
 builtin/fetch-pack.c             |   45 +++++++++++++++++++++++++++++++++++++-
 fetch-pack.h                     |    3 ++-
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index ed1bdaacd1..1dd44fd348 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -32,6 +32,16 @@ OPTIONS
 --all::
 	Fetch all remote refs.
 
+--stdin::
+	Take the list of refs from stdin, one per line. If there
+	are refs specified on the command line in addition to this
+	option, then the refs from stdin are processed after those
+	on the command line.
++
+If '--stateless-rpc' is specified together with this option then
+the list of refs must be in packet format (pkt-line) with a flush
+packet terminating the list.
+
 -q::
 --quiet::
 	Pass '-q' flag to 'git unpack-objects'; this makes the
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index a4d3e90a86..1a90fa852f 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -23,7 +23,9 @@ static struct fetch_pack_args args = {
 };
 
 static const char fetch_pack_usage[] =
-"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
+"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] "
+"[--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] "
+"[--no-progress] [-v] [<host>:]<directory> [<refs>...]";
 
 #define COMPLETE	(1U << 0)
 #define COMMON		(1U << 1)
@@ -941,6 +943,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				args.fetch_all = 1;
 				continue;
 			}
+			if (!strcmp("--stdin", arg)) {
+				args.refs_from_stdin = 1;
+				continue;
+			}
 			if (!strcmp("-v", arg)) {
 				args.verbose = 1;
 				continue;
@@ -972,6 +978,43 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	if (!dest)
 		usage(fetch_pack_usage);
 
+	if (args.refs_from_stdin) {
+		/* copy refs from cmdline to new growable list,
+		   then append the refs from stdin */
+		int alloc_heads = nr_heads;
+		int size = nr_heads * sizeof(*heads);
+		heads = memcpy(xmalloc(size), heads, size);
+		if (args.stateless_rpc) {
+			/* in stateless RPC mode we use pkt-line to read
+			   from stdin, until we get a flush packet */
+			static char line[1000];
+			for (;;) {
+				int n = packet_read_line(0, line, sizeof(line));
+				if (!n)
+					break;
+				if (line[n-1] == '\n')
+					line[--n] = '\0';
+				ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
+				heads[nr_heads++] = xmemdupz(line, n);
+			}
+		}
+		else {
+			/* read from stdin one ref per line, until EOF */
+			struct strbuf line;
+			strbuf_init(&line, 0);
+			for (;;) {
+				if (strbuf_getline(&line, stdin, '\n') == EOF)
+					break;
+				strbuf_trim(&line);
+				if (!line.len)
+					continue; /* skip empty lines */
+				ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
+				heads[nr_heads++] = strbuf_detach(&line, NULL);
+			}
+			strbuf_release(&line);
+		}
+	}
+
 	if (args.stateless_rpc) {
 		conn = NULL;
 		fd[0] = 0;
diff --git a/fetch-pack.h b/fetch-pack.h
index 0608edae3f..292d69389e 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -13,7 +13,8 @@ struct fetch_pack_args {
 		verbose:1,
 		no_progress:1,
 		include_tag:1,
-		stateless_rpc:1;
+		stateless_rpc:1,
+		refs_from_stdin:1;
 };
 
 struct ref *fetch_pack(struct fetch_pack_args *args,
-- 
1.7.9.5.4.g4f508

  reply	other threads:[~2012-03-27  6:25 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-18  8:14 Clone fails on a repo with too many heads/tags Ivan Todoroski
2012-03-18 11:37 ` Ivan Todoroski
2012-03-18 12:04   ` Nguyen Thai Ngoc Duy
2012-03-18 16:36 ` Jakub Narebski
2012-03-18 19:07   ` Jeff King
2012-03-18 22:07     ` Jakub Narebski
2012-03-19  2:32       ` Jeff King
2012-03-19  2:43         ` Nguyen Thai Ngoc Duy
2012-03-19  2:45           ` Jeff King
2012-03-19  1:05     ` Ivan Todoroski
2012-03-19  1:30     ` Nguyen Thai Ngoc Duy
2012-03-19  2:44       ` Jeff King
2012-03-21 11:05         ` Ivan Todoroski
2012-03-21 14:28           ` Shawn Pearce
2012-03-21 17:14             ` Jeff King
2012-03-21 17:59               ` Jakub Narebski
2012-03-21 20:02               ` Ivan Todoroski
2012-03-21 20:17                 ` Jeff King
2012-03-24 20:49                   ` Ivan Todoroski
2012-03-25  1:06                     ` Jeff King
2012-03-25  2:32                       ` Jeff King
2012-03-25 17:33                         ` Ivan Todoroski
2012-03-25 17:54                           ` Ivan Todoroski
2012-03-26 17:33                             ` Jeff King
2012-03-27  7:07                               ` Ivan Todoroski
2012-03-25 15:30                       ` Ivan Todoroski
2012-03-24 20:53                   ` [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin Ivan Todoroski
2012-03-25  1:19                     ` Jeff King
2012-03-25  9:39                       ` Ivan Todoroski
2012-03-25 15:15                         ` Ivan Todoroski
2012-03-25 20:00                       ` Ivan Todoroski
2012-03-26 17:21                         ` Jeff King
2012-03-26 17:49                           ` Ivan Todoroski
2012-03-26 17:51                             ` Jeff King
2012-03-24 20:54                   ` [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
2012-03-25  1:24                     ` Jeff King
2012-03-25  9:52                       ` Ivan Todoroski
2012-03-26 17:24                         ` Jeff King
2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
2012-03-27  6:25                 ` Ivan Todoroski [this message]
2012-03-27 16:59                   ` [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin Junio C Hamano
2012-03-27 23:18                     ` Ivan Todoroski
2012-03-27 23:26                       ` Junio C Hamano
2012-03-27 23:48                         ` Ivan Todoroski
2012-03-27  6:26                 ` [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
2012-03-27 17:18                   ` Junio C Hamano
2012-03-27 23:20                     ` Ivan Todoroski
2012-03-27  6:27                 ` [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option Ivan Todoroski
2012-03-27 17:40                   ` Junio C Hamano
2012-03-27 23:36                     ` Ivan Todoroski
2012-03-27 23:43                       ` Junio C Hamano
2012-03-28  0:14                       ` Ivan Todoroski
2012-03-27  6:28                 ` [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow Ivan Todoroski
2012-03-27 17:43                   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F715D55.8020109@gmx.net \
    --to=grnch@gmx.net \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).