git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4] git clone: is an URL local or ssh
@ 2013-11-04 21:04 Torsten Bögershausen
  0 siblings, 0 replies; 9+ messages in thread
From: Torsten Bögershausen @ 2013-11-04 21:04 UTC (permalink / raw)
  To: git

Commit 8d3d28f5 added test cases for URLs which should be ssh.

Add more tests testing all the combinations:
 -IPv4 or IPv6
 -path starting with "/" or with "/~"
 -with and without the ssh:// scheme

Add tests for ssh:// with port number.

When a git repository "foo:bar" exist, git clone will call
absolute_path() and git_connect() will be called with
something like "/home/user/projects/foo:bar"

Tighten the test and use "foo:bar" instead of "./foo:bar",
refactor clear_ssh() and expect_ssh() into test_clone_url().

"git clone foo/bar:baz" should not be ssh:
  Make the rule
  "if there is a slash before the first colon, it is not ssh"
  more strict by using the same is_local() in both connect.c
  and transport.c. Add a test case.

Bug fixes for corner cases:
- Handle clone of [::1]:/~repo correctly (2e7766655):
  Git should call "ssh ::1 ~repo", not ssh ::1 /~repo
  This was caused by looking at (host != url), which never
  worked for host names with literal IPv6 adresses, embedded by []
  Support for the [] URLs was introduced in 356bece0a.

- Do not tamper local URLs starting with '[' which have a ']'

Bug fix for msygit in t5601 : use $PWD insted of $(pwd)

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Changes since V3:
 - Integrated Peffs suggestions in t5601
   (Remove clear_ssh)
 - Decide early if it is ssl or local in connect.c
 - Use "git fetch" instead of "git clone" in t5601:
   clone use absolute_path() (adding a / before :), fetch does not
 - Add a test for dos_drive "C:temp" for msys
 - Make tests work for msys by using $PWD instead of $(pwd) (file://$PWD)

 connect.c        |  50 ++++++++++++++++--------
 connect.h        |   1 +
 t/t5601-clone.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++-----------
 transport.c      |   8 ----
 4 files changed, 130 insertions(+), 46 deletions(-)

diff --git a/connect.c b/connect.c
index 06e88b0..022d122 100644
--- a/connect.c
+++ b/connect.c
@@ -547,6 +547,25 @@ static char *get_port(char *host)
 
 static struct child_process no_fork;
 
+int is_local(const char *url)
+{
+	const char *colon = strchr(url, ':');
+	const char *slash = strchr(url, '/');
+	if (has_dos_drive_prefix(url))
+		return 1;
+	if (!colon)
+		return 1;
+	if (slash && slash < colon)
+		return 1;
+	if (url[0] == '[') {
+		const char *end = strchr(url + 1, ']');
+		if (!end)
+			return 1;
+		return is_local(end);
+	}
+	return 0;
+}
+
 /*
  * This returns a dummy child_process if the transport protocol does not
  * need fork(2), or a struct child_process object if it does.  Once done,
@@ -564,7 +583,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	char *url;
 	char *host, *path;
 	char *end;
-	int c;
+	int separator;
 	struct child_process *conn = &no_fork;
 	enum protocol protocol = PROTO_LOCAL;
 	int free_path = 0;
@@ -587,17 +606,19 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 		*host = '\0';
 		protocol = get_protocol(url);
 		host += 3;
-		c = '/';
+		separator = '/';
 	} else {
 		host = url;
-		c = ':';
+		protocol = is_local(url) ? PROTO_LOCAL : PROTO_SSH;
+		separator = ':';
 	}
 
 	/*
 	 * Don't do destructive transforms with git:// as that
 	 * protocol code does '[]' unwrapping of its own.
+	 * Don't change local URLs
 	 */
-	if (host[0] == '[') {
+	if (protocol != PROTO_LOCAL && host[0] == '[') {
 		end = strchr(host + 1, ']');
 		if (end) {
 			if (protocol != PROTO_GIT) {
@@ -610,17 +631,14 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	} else
 		end = host;
 
-	path = strchr(end, c);
-	if (path && !has_dos_drive_prefix(end)) {
-		if (c == ':') {
-			if (host != url || path < strchrnul(host, '/')) {
-				protocol = PROTO_SSH;
-				*path++ = '\0';
-			} else /* '/' in the host part, assume local path */
-				path = end;
+	path = strchr(end, separator);
+	if (separator == ':') {
+		if (path && protocol == PROTO_SSH) {
+			*path++ = '\0';
+		} else {/* assume local path */
+			path = end;
 		}
-	} else
-		path = end;
+	}
 
 	if (!path || !*path)
 		die("No path specified. See 'man git-pull' for valid url syntax");
@@ -629,7 +647,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	 * null-terminate hostname and point path to ~ for URL's like this:
 	 *    ssh://host.xz/~user/repo
 	 */
-	if (protocol != PROTO_LOCAL && host != url) {
+	if (protocol != PROTO_LOCAL && separator == '/') {
 		char *ptr = path;
 		if (path[1] == '~')
 			path++;
@@ -644,7 +662,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	/*
 	 * Add support for ssh port: ssh://host.xy:<port>/...
 	 */
-	if (protocol == PROTO_SSH && host != url)
+	if (protocol == PROTO_SSH && separator == '/')
 		port = get_port(end);
 
 	if (protocol == PROTO_GIT) {
diff --git a/connect.h b/connect.h
index 64fb7db..fb2de9b 100644
--- a/connect.h
+++ b/connect.h
@@ -5,6 +5,7 @@
 extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
 extern int finish_connect(struct child_process *conn);
 extern int git_connection_is_socket(struct child_process *conn);
+extern int is_local(const char *url);
 extern int server_supports(const char *feature);
 extern int parse_feature_request(const char *features, const char *feature);
 extern const char *server_feature_value(const char *feature, int *len_ret);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1d1c875..5f2b5a0 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -30,20 +30,20 @@ test_expect_success 'clone with excess parameters (1)' '
 test_expect_success 'clone with excess parameters (2)' '
 
 	rm -fr dst &&
-	test_must_fail git clone -n "file://$(pwd)/src" dst junk
+	test_must_fail git clone -n "file://$PWD/src" dst junk
 
 '
 
 test_expect_success C_LOCALE_OUTPUT 'output from clone' '
 	rm -fr dst &&
-	git clone -n "file://$(pwd)/src" dst >output 2>&1 &&
+	git clone -n "file://$PWD/src" dst >output 2>&1 &&
 	test $(grep Clon output | wc -l) = 1
 '
 
 test_expect_success 'clone does not keep pack' '
 
 	rm -fr dst &&
-	git clone -n "file://$(pwd)/src" dst &&
+	git clone -n "file://$PWD/src" dst &&
 	! test -f dst/file &&
 	! (echo dst/.git/objects/pack/pack-* | grep "\.keep")
 
@@ -172,12 +172,12 @@ test_expect_success 'clone a void' '
 	(
 		cd src-0 && git init
 	) &&
-	git clone "file://$(pwd)/src-0" target-6 2>err-6 &&
+	git clone "file://$PWD/src-0" target-6 2>err-6 &&
 	! grep "fatal:" err-6 &&
 	(
 		cd src-0 && test_commit A
 	) &&
-	git clone "file://$(pwd)/src-0" target-7 2>err-7 &&
+	git clone "file://$PWD/src-0" target-7 2>err-7 &&
 	! grep "fatal:" err-7 &&
 	# There is no reason to insist they are bit-for-bit
 	# identical, but this test should suffice for now.
@@ -291,14 +291,14 @@ test_expect_success 'setup ssh wrapper' '
 
 	GIT_SSH="$TRASH_DIRECTORY/ssh-wrapper" &&
 	export GIT_SSH &&
-	export TRASH_DIRECTORY
+	export TRASH_DIRECTORY &&
+	>"$TRASH_DIRECTORY"/ssh-output
 '
 
-clear_ssh () {
-	>"$TRASH_DIRECTORY/ssh-output"
-}
-
 expect_ssh () {
+	test_when_finished '
+	  (cd "$TRASH_DIRECTORY" && rm -f ssh-expect && >ssh-output)
+	' &&
 	{
 		case "$1" in
 		none)
@@ -310,23 +310,96 @@ expect_ssh () {
 	(cd "$TRASH_DIRECTORY" && test_cmp ssh-expect ssh-output)
 }
 
-test_expect_success 'cloning myhost:src uses ssh' '
-	clear_ssh &&
-	git clone myhost:src ssh-clone &&
-	expect_ssh myhost src
-'
-
 test_expect_success NOT_MINGW,NOT_CYGWIN 'clone local path foo:bar' '
-	clear_ssh &&
 	cp -R src "foo:bar" &&
-	git clone "./foo:bar" foobar &&
+	git clone "foo:bar" foobar &&
+	expect_ssh none
+'
+
+test_expect_success 'clone local path foo2' '
+	cp -R src "foo2" &&
+	git clone "foo2" foobar2 &&
 	expect_ssh none
 '
 
-test_expect_success 'bracketed hostnames are still ssh' '
-	clear_ssh &&
-	git clone "[myhost:123]:src" ssh-bracket-clone &&
-	expect_ssh myhost:123 src
+counter=0
+# $1 url
+# $2 none|host
+# $3 path
+test_fetch_url () {
+	counter=$(($counter + 1))
+	test_might_fail git fetch "$1" tmp$counter &&
+	expect_ssh "$2" "$3"
+}
+
+test_expect_success NOT_MINGW 'fetch c:temp is ssl' '
+	test_fetch_url c:temp c temp
+'
+
+test_expect_success MINGW 'fetch c:temp is dos drive' '
+	test_fetch_url c:temp none
+'
+
+#ip v4
+for repo in rep rep/home/project /~proj 123
+do
+	test_expect_success "fetchinghost:$repo" '
+		test_fetch_url host:$repo host $repo
+	'
+done
+
+#ipv6
+for repo in rep rep/home/project /~proj 123
+do
+	test_expect_success "fetching[::1]:$repo" '
+		test_fetch_url [::1]:$repo ::1 $repo
+	'
+done
+
+# Corner cases
+for url in [foo]bar/baz:qux [foo/bar]:baz foo/bar:baz
+do
+	test_expect_success "fetching$url is not ssh" '
+			test_fetch_url $url none
+	'
+done
+
+#with ssh:// scheme
+test_expect_success 'ssh://host.xz/home/user/repo' '
+	test_fetch_url "ssh://host.xz/home/user/repo" host.xz "/home/user/repo"
+'
+
+# from home directory
+test_expect_success 'ssh://host.xz/~repo' '
+	test_fetch_url "ssh://host.xz/~repo" host.xz "~repo"
+'
+# with port number
+test_expect_success 'ssh://host.xz:22/home/user/repo' '
+	test_fetch_url "ssh://host.xz:22/home/user/repo" "-p 22 host.xz" "/home/user/repo"
+'
+
+# from home directory with port number
+test_expect_success 'ssh://host.xz:22/~repo' '
+	test_fetch_url "ssh://host.xz:22/~repo" "-p 22 host.xz" "~repo"
+'
+
+#IPv6
+test_expect_success 'ssh://[::1]/home/user/repo' '
+	test_fetch_url "ssh://[::1]/home/user/repo" "::1" "/home/user/repo"
+'
+
+#IPv6 from home directory
+test_expect_success 'ssh://[::1]/~repo' '
+	test_fetch_url "ssh://[::1]/~repo" "::1" "~repo"
+'
+
+#IPv6 with port number
+test_expect_success 'ssh://[::1]:22/home/user/repo' '
+	test_fetch_url "ssh://[::1]:22/home/user/repo" "-p 22 ::1" "/home/user/repo"
+'
+#IPv6 from home directory with port number
+test_expect_success 'ssh://[::1]:22/~repo' '
+	test_fetch_url "ssh://[::1]:22/~repo" "-p 22 ::1" "~repo"
 '
 
 test_expect_success 'clone from a repository with two identical branches' '
diff --git a/transport.c b/transport.c
index 7202b77..a09ba95 100644
--- a/transport.c
+++ b/transport.c
@@ -885,14 +885,6 @@ void transport_take_over(struct transport *transport,
 	transport->cannot_reuse = 1;
 }
 
-static int is_local(const char *url)
-{
-	const char *colon = strchr(url, ':');
-	const char *slash = strchr(url, '/');
-	return !colon || (slash && slash < colon) ||
-		has_dos_drive_prefix(url);
-}
-
 static int is_file(const char *url)
 {
 	struct stat buf;
-- 
1.8.4.457.g424cb08

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4] git clone: is an URL local or ssh
@ 2013-11-04 21:18 Torsten Bögershausen
  0 siblings, 0 replies; 9+ messages in thread
From: Torsten Bögershausen @ 2013-11-04 21:18 UTC (permalink / raw)
  To: git

Commit 8d3d28f5 added test cases for URLs which should be ssh.

Add more tests testing all the combinations:
 -IPv4 or IPv6
 -path starting with "/" or with "/~"
 -with and without the ssh:// scheme

Add tests for ssh:// with port number.

When a git repository "foo:bar" exist, git clone will call
absolute_path() and git_connect() will be called with
something like "/home/user/projects/foo:bar"

Tighten the test and use "foo:bar" instead of "./foo:bar",
refactor clear_ssh() and expect_ssh() into test_clone_url().

"git clone foo/bar:baz" should not be ssh:
  Make the rule
  "if there is a slash before the first colon, it is not ssh"
  more strict by using the same is_local() in both connect.c
  and transport.c. Add a test case.

Bug fixes for corner cases:
- Handle clone of [::1]:/~repo correctly (2e7766655):
  Git should call "ssh ::1 ~repo", not ssh ::1 /~repo
  This was caused by looking at (host != url), which never
  worked for host names with literal IPv6 adresses, embedded by []
  Support for the [] URLs was introduced in 356bece0a.

- Do not tamper local URLs starting with '[' which have a ']'

Bug fix for msygit in t5601 : use $PWD insted of $(pwd)

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Changes since V3:
 - Integrated Peffs suggestions in t5601
   (Remove clear_ssh)
 - Decide early if it is ssl or local in connect.c
 - Use "git fetch" instead of "git clone" in t5601:
   clone use absolute_path() (adding a / before :), fetch does not
 - Add a test for dos_drive "C:temp" for msys
 - Make tests work for msys by using $PWD instead of $(pwd) (file://$PWD)

 connect.c        |  50 ++++++++++++++++--------
 connect.h        |   1 +
 t/t5601-clone.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++-----------
 transport.c      |   8 ----
 4 files changed, 130 insertions(+), 46 deletions(-)

diff --git a/connect.c b/connect.c
index 06e88b0..022d122 100644
--- a/connect.c
+++ b/connect.c
@@ -547,6 +547,25 @@ static char *get_port(char *host)
 
 static struct child_process no_fork;
 
+int is_local(const char *url)
+{
+	const char *colon = strchr(url, ':');
+	const char *slash = strchr(url, '/');
+	if (has_dos_drive_prefix(url))
+		return 1;
+	if (!colon)
+		return 1;
+	if (slash && slash < colon)
+		return 1;
+	if (url[0] == '[') {
+		const char *end = strchr(url + 1, ']');
+		if (!end)
+			return 1;
+		return is_local(end);
+	}
+	return 0;
+}
+
 /*
  * This returns a dummy child_process if the transport protocol does not
  * need fork(2), or a struct child_process object if it does.  Once done,
@@ -564,7 +583,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	char *url;
 	char *host, *path;
 	char *end;
-	int c;
+	int separator;
 	struct child_process *conn = &no_fork;
 	enum protocol protocol = PROTO_LOCAL;
 	int free_path = 0;
@@ -587,17 +606,19 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 		*host = '\0';
 		protocol = get_protocol(url);
 		host += 3;
-		c = '/';
+		separator = '/';
 	} else {
 		host = url;
-		c = ':';
+		protocol = is_local(url) ? PROTO_LOCAL : PROTO_SSH;
+		separator = ':';
 	}
 
 	/*
 	 * Don't do destructive transforms with git:// as that
 	 * protocol code does '[]' unwrapping of its own.
+	 * Don't change local URLs
 	 */
-	if (host[0] == '[') {
+	if (protocol != PROTO_LOCAL && host[0] == '[') {
 		end = strchr(host + 1, ']');
 		if (end) {
 			if (protocol != PROTO_GIT) {
@@ -610,17 +631,14 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	} else
 		end = host;
 
-	path = strchr(end, c);
-	if (path && !has_dos_drive_prefix(end)) {
-		if (c == ':') {
-			if (host != url || path < strchrnul(host, '/')) {
-				protocol = PROTO_SSH;
-				*path++ = '\0';
-			} else /* '/' in the host part, assume local path */
-				path = end;
+	path = strchr(end, separator);
+	if (separator == ':') {
+		if (path && protocol == PROTO_SSH) {
+			*path++ = '\0';
+		} else {/* assume local path */
+			path = end;
 		}
-	} else
-		path = end;
+	}
 
 	if (!path || !*path)
 		die("No path specified. See 'man git-pull' for valid url syntax");
@@ -629,7 +647,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	 * null-terminate hostname and point path to ~ for URL's like this:
 	 *    ssh://host.xz/~user/repo
 	 */
-	if (protocol != PROTO_LOCAL && host != url) {
+	if (protocol != PROTO_LOCAL && separator == '/') {
 		char *ptr = path;
 		if (path[1] == '~')
 			path++;
@@ -644,7 +662,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	/*
 	 * Add support for ssh port: ssh://host.xy:<port>/...
 	 */
-	if (protocol == PROTO_SSH && host != url)
+	if (protocol == PROTO_SSH && separator == '/')
 		port = get_port(end);
 
 	if (protocol == PROTO_GIT) {
diff --git a/connect.h b/connect.h
index 64fb7db..fb2de9b 100644
--- a/connect.h
+++ b/connect.h
@@ -5,6 +5,7 @@
 extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
 extern int finish_connect(struct child_process *conn);
 extern int git_connection_is_socket(struct child_process *conn);
+extern int is_local(const char *url);
 extern int server_supports(const char *feature);
 extern int parse_feature_request(const char *features, const char *feature);
 extern const char *server_feature_value(const char *feature, int *len_ret);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1d1c875..5f2b5a0 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -30,20 +30,20 @@ test_expect_success 'clone with excess parameters (1)' '
 test_expect_success 'clone with excess parameters (2)' '
 
 	rm -fr dst &&
-	test_must_fail git clone -n "file://$(pwd)/src" dst junk
+	test_must_fail git clone -n "file://$PWD/src" dst junk
 
 '
 
 test_expect_success C_LOCALE_OUTPUT 'output from clone' '
 	rm -fr dst &&
-	git clone -n "file://$(pwd)/src" dst >output 2>&1 &&
+	git clone -n "file://$PWD/src" dst >output 2>&1 &&
 	test $(grep Clon output | wc -l) = 1
 '
 
 test_expect_success 'clone does not keep pack' '
 
 	rm -fr dst &&
-	git clone -n "file://$(pwd)/src" dst &&
+	git clone -n "file://$PWD/src" dst &&
 	! test -f dst/file &&
 	! (echo dst/.git/objects/pack/pack-* | grep "\.keep")
 
@@ -172,12 +172,12 @@ test_expect_success 'clone a void' '
 	(
 		cd src-0 && git init
 	) &&
-	git clone "file://$(pwd)/src-0" target-6 2>err-6 &&
+	git clone "file://$PWD/src-0" target-6 2>err-6 &&
 	! grep "fatal:" err-6 &&
 	(
 		cd src-0 && test_commit A
 	) &&
-	git clone "file://$(pwd)/src-0" target-7 2>err-7 &&
+	git clone "file://$PWD/src-0" target-7 2>err-7 &&
 	! grep "fatal:" err-7 &&
 	# There is no reason to insist they are bit-for-bit
 	# identical, but this test should suffice for now.
@@ -291,14 +291,14 @@ test_expect_success 'setup ssh wrapper' '
 
 	GIT_SSH="$TRASH_DIRECTORY/ssh-wrapper" &&
 	export GIT_SSH &&
-	export TRASH_DIRECTORY
+	export TRASH_DIRECTORY &&
+	>"$TRASH_DIRECTORY"/ssh-output
 '
 
-clear_ssh () {
-	>"$TRASH_DIRECTORY/ssh-output"
-}
-
 expect_ssh () {
+	test_when_finished '
+	  (cd "$TRASH_DIRECTORY" && rm -f ssh-expect && >ssh-output)
+	' &&
 	{
 		case "$1" in
 		none)
@@ -310,23 +310,96 @@ expect_ssh () {
 	(cd "$TRASH_DIRECTORY" && test_cmp ssh-expect ssh-output)
 }
 
-test_expect_success 'cloning myhost:src uses ssh' '
-	clear_ssh &&
-	git clone myhost:src ssh-clone &&
-	expect_ssh myhost src
-'
-
 test_expect_success NOT_MINGW,NOT_CYGWIN 'clone local path foo:bar' '
-	clear_ssh &&
 	cp -R src "foo:bar" &&
-	git clone "./foo:bar" foobar &&
+	git clone "foo:bar" foobar &&
+	expect_ssh none
+'
+
+test_expect_success 'clone local path foo2' '
+	cp -R src "foo2" &&
+	git clone "foo2" foobar2 &&
 	expect_ssh none
 '
 
-test_expect_success 'bracketed hostnames are still ssh' '
-	clear_ssh &&
-	git clone "[myhost:123]:src" ssh-bracket-clone &&
-	expect_ssh myhost:123 src
+counter=0
+# $1 url
+# $2 none|host
+# $3 path
+test_fetch_url () {
+	counter=$(($counter + 1))
+	test_might_fail git fetch "$1" tmp$counter &&
+	expect_ssh "$2" "$3"
+}
+
+test_expect_success NOT_MINGW 'fetch c:temp is ssl' '
+	test_fetch_url c:temp c temp
+'
+
+test_expect_success MINGW 'fetch c:temp is dos drive' '
+	test_fetch_url c:temp none
+'
+
+#ip v4
+for repo in rep rep/home/project /~proj 123
+do
+	test_expect_success "fetchinghost:$repo" '
+		test_fetch_url host:$repo host $repo
+	'
+done
+
+#ipv6
+for repo in rep rep/home/project /~proj 123
+do
+	test_expect_success "fetching[::1]:$repo" '
+		test_fetch_url [::1]:$repo ::1 $repo
+	'
+done
+
+# Corner cases
+for url in [foo]bar/baz:qux [foo/bar]:baz foo/bar:baz
+do
+	test_expect_success "fetching$url is not ssh" '
+			test_fetch_url $url none
+	'
+done
+
+#with ssh:// scheme
+test_expect_success 'ssh://host.xz/home/user/repo' '
+	test_fetch_url "ssh://host.xz/home/user/repo" host.xz "/home/user/repo"
+'
+
+# from home directory
+test_expect_success 'ssh://host.xz/~repo' '
+	test_fetch_url "ssh://host.xz/~repo" host.xz "~repo"
+'
+# with port number
+test_expect_success 'ssh://host.xz:22/home/user/repo' '
+	test_fetch_url "ssh://host.xz:22/home/user/repo" "-p 22 host.xz" "/home/user/repo"
+'
+
+# from home directory with port number
+test_expect_success 'ssh://host.xz:22/~repo' '
+	test_fetch_url "ssh://host.xz:22/~repo" "-p 22 host.xz" "~repo"
+'
+
+#IPv6
+test_expect_success 'ssh://[::1]/home/user/repo' '
+	test_fetch_url "ssh://[::1]/home/user/repo" "::1" "/home/user/repo"
+'
+
+#IPv6 from home directory
+test_expect_success 'ssh://[::1]/~repo' '
+	test_fetch_url "ssh://[::1]/~repo" "::1" "~repo"
+'
+
+#IPv6 with port number
+test_expect_success 'ssh://[::1]:22/home/user/repo' '
+	test_fetch_url "ssh://[::1]:22/home/user/repo" "-p 22 ::1" "/home/user/repo"
+'
+#IPv6 from home directory with port number
+test_expect_success 'ssh://[::1]:22/~repo' '
+	test_fetch_url "ssh://[::1]:22/~repo" "-p 22 ::1" "~repo"
 '
 
 test_expect_success 'clone from a repository with two identical branches' '
diff --git a/transport.c b/transport.c
index 7202b77..a09ba95 100644
--- a/transport.c
+++ b/transport.c
@@ -885,14 +885,6 @@ void transport_take_over(struct transport *transport,
 	transport->cannot_reuse = 1;
 }
 
-static int is_local(const char *url)
-{
-	const char *colon = strchr(url, ':');
-	const char *slash = strchr(url, '/');
-	return !colon || (slash && slash < colon) ||
-		has_dos_drive_prefix(url);
-}
-
 static int is_file(const char *url)
 {
 	struct stat buf;
-- 
1.8.4.457.g424cb08

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4] git clone: is an URL local or ssh
@ 2013-11-04 21:20 Torsten Bögershausen
  2013-11-05  7:14 ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Torsten Bögershausen @ 2013-11-04 21:20 UTC (permalink / raw)
  To: git

Commit 8d3d28f5 added test cases for URLs which should be ssh.

Add more tests testing all the combinations:
 -IPv4 or IPv6
 -path starting with "/" or with "/~"
 -with and without the ssh:// scheme

Add tests for ssh:// with port number.

When a git repository "foo:bar" exist, git clone will call
absolute_path() and git_connect() will be called with
something like "/home/user/projects/foo:bar"

Tighten the test and use "foo:bar" instead of "./foo:bar",
refactor clear_ssh() and expect_ssh() into test_clone_url().

"git clone foo/bar:baz" should not be ssh:
  Make the rule
  "if there is a slash before the first colon, it is not ssh"
  more strict by using the same is_local() in both connect.c
  and transport.c. Add a test case.

Bug fixes for corner cases:
- Handle clone of [::1]:/~repo correctly (2e7766655):
  Git should call "ssh ::1 ~repo", not ssh ::1 /~repo
  This was caused by looking at (host != url), which never
  worked for host names with literal IPv6 adresses, embedded by []
  Support for the [] URLs was introduced in 356bece0a.

- Do not tamper local URLs starting with '[' which have a ']'

Bug fix for msygit in t5601 : use $PWD insted of $(pwd)

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Changes since V3:
 - Integrated Peffs suggestions in t5601
   (Remove clear_ssh)
 - Decide early if it is ssl or local in connect.c
 - Use "git fetch" instead of "git clone" in t5601:
   clone use absolute_path() (adding a / before :), fetch does not
 - Add a test for dos_drive "C:temp" for msys
 - Make tests work for msys by using $PWD instead of $(pwd) (file://$PWD)

 connect.c        |  50 ++++++++++++++++--------
 connect.h        |   1 +
 t/t5601-clone.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++-----------
 transport.c      |   8 ----
 4 files changed, 130 insertions(+), 46 deletions(-)

diff --git a/connect.c b/connect.c
index 06e88b0..022d122 100644
--- a/connect.c
+++ b/connect.c
@@ -547,6 +547,25 @@ static char *get_port(char *host)
 
 static struct child_process no_fork;
 
+int is_local(const char *url)
+{
+	const char *colon = strchr(url, ':');
+	const char *slash = strchr(url, '/');
+	if (has_dos_drive_prefix(url))
+		return 1;
+	if (!colon)
+		return 1;
+	if (slash && slash < colon)
+		return 1;
+	if (url[0] == '[') {
+		const char *end = strchr(url + 1, ']');
+		if (!end)
+			return 1;
+		return is_local(end);
+	}
+	return 0;
+}
+
 /*
  * This returns a dummy child_process if the transport protocol does not
  * need fork(2), or a struct child_process object if it does.  Once done,
@@ -564,7 +583,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	char *url;
 	char *host, *path;
 	char *end;
-	int c;
+	int separator;
 	struct child_process *conn = &no_fork;
 	enum protocol protocol = PROTO_LOCAL;
 	int free_path = 0;
@@ -587,17 +606,19 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 		*host = '\0';
 		protocol = get_protocol(url);
 		host += 3;
-		c = '/';
+		separator = '/';
 	} else {
 		host = url;
-		c = ':';
+		protocol = is_local(url) ? PROTO_LOCAL : PROTO_SSH;
+		separator = ':';
 	}
 
 	/*
 	 * Don't do destructive transforms with git:// as that
 	 * protocol code does '[]' unwrapping of its own.
+	 * Don't change local URLs
 	 */
-	if (host[0] == '[') {
+	if (protocol != PROTO_LOCAL && host[0] == '[') {
 		end = strchr(host + 1, ']');
 		if (end) {
 			if (protocol != PROTO_GIT) {
@@ -610,17 +631,14 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	} else
 		end = host;
 
-	path = strchr(end, c);
-	if (path && !has_dos_drive_prefix(end)) {
-		if (c == ':') {
-			if (host != url || path < strchrnul(host, '/')) {
-				protocol = PROTO_SSH;
-				*path++ = '\0';
-			} else /* '/' in the host part, assume local path */
-				path = end;
+	path = strchr(end, separator);
+	if (separator == ':') {
+		if (path && protocol == PROTO_SSH) {
+			*path++ = '\0';
+		} else {/* assume local path */
+			path = end;
 		}
-	} else
-		path = end;
+	}
 
 	if (!path || !*path)
 		die("No path specified. See 'man git-pull' for valid url syntax");
@@ -629,7 +647,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	 * null-terminate hostname and point path to ~ for URL's like this:
 	 *    ssh://host.xz/~user/repo
 	 */
-	if (protocol != PROTO_LOCAL && host != url) {
+	if (protocol != PROTO_LOCAL && separator == '/') {
 		char *ptr = path;
 		if (path[1] == '~')
 			path++;
@@ -644,7 +662,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	/*
 	 * Add support for ssh port: ssh://host.xy:<port>/...
 	 */
-	if (protocol == PROTO_SSH && host != url)
+	if (protocol == PROTO_SSH && separator == '/')
 		port = get_port(end);
 
 	if (protocol == PROTO_GIT) {
diff --git a/connect.h b/connect.h
index 64fb7db..fb2de9b 100644
--- a/connect.h
+++ b/connect.h
@@ -5,6 +5,7 @@
 extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
 extern int finish_connect(struct child_process *conn);
 extern int git_connection_is_socket(struct child_process *conn);
+extern int is_local(const char *url);
 extern int server_supports(const char *feature);
 extern int parse_feature_request(const char *features, const char *feature);
 extern const char *server_feature_value(const char *feature, int *len_ret);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1d1c875..5f2b5a0 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -30,20 +30,20 @@ test_expect_success 'clone with excess parameters (1)' '
 test_expect_success 'clone with excess parameters (2)' '
 
 	rm -fr dst &&
-	test_must_fail git clone -n "file://$(pwd)/src" dst junk
+	test_must_fail git clone -n "file://$PWD/src" dst junk
 
 '
 
 test_expect_success C_LOCALE_OUTPUT 'output from clone' '
 	rm -fr dst &&
-	git clone -n "file://$(pwd)/src" dst >output 2>&1 &&
+	git clone -n "file://$PWD/src" dst >output 2>&1 &&
 	test $(grep Clon output | wc -l) = 1
 '
 
 test_expect_success 'clone does not keep pack' '
 
 	rm -fr dst &&
-	git clone -n "file://$(pwd)/src" dst &&
+	git clone -n "file://$PWD/src" dst &&
 	! test -f dst/file &&
 	! (echo dst/.git/objects/pack/pack-* | grep "\.keep")
 
@@ -172,12 +172,12 @@ test_expect_success 'clone a void' '
 	(
 		cd src-0 && git init
 	) &&
-	git clone "file://$(pwd)/src-0" target-6 2>err-6 &&
+	git clone "file://$PWD/src-0" target-6 2>err-6 &&
 	! grep "fatal:" err-6 &&
 	(
 		cd src-0 && test_commit A
 	) &&
-	git clone "file://$(pwd)/src-0" target-7 2>err-7 &&
+	git clone "file://$PWD/src-0" target-7 2>err-7 &&
 	! grep "fatal:" err-7 &&
 	# There is no reason to insist they are bit-for-bit
 	# identical, but this test should suffice for now.
@@ -291,14 +291,14 @@ test_expect_success 'setup ssh wrapper' '
 
 	GIT_SSH="$TRASH_DIRECTORY/ssh-wrapper" &&
 	export GIT_SSH &&
-	export TRASH_DIRECTORY
+	export TRASH_DIRECTORY &&
+	>"$TRASH_DIRECTORY"/ssh-output
 '
 
-clear_ssh () {
-	>"$TRASH_DIRECTORY/ssh-output"
-}
-
 expect_ssh () {
+	test_when_finished '
+	  (cd "$TRASH_DIRECTORY" && rm -f ssh-expect && >ssh-output)
+	' &&
 	{
 		case "$1" in
 		none)
@@ -310,23 +310,96 @@ expect_ssh () {
 	(cd "$TRASH_DIRECTORY" && test_cmp ssh-expect ssh-output)
 }
 
-test_expect_success 'cloning myhost:src uses ssh' '
-	clear_ssh &&
-	git clone myhost:src ssh-clone &&
-	expect_ssh myhost src
-'
-
 test_expect_success NOT_MINGW,NOT_CYGWIN 'clone local path foo:bar' '
-	clear_ssh &&
 	cp -R src "foo:bar" &&
-	git clone "./foo:bar" foobar &&
+	git clone "foo:bar" foobar &&
+	expect_ssh none
+'
+
+test_expect_success 'clone local path foo2' '
+	cp -R src "foo2" &&
+	git clone "foo2" foobar2 &&
 	expect_ssh none
 '
 
-test_expect_success 'bracketed hostnames are still ssh' '
-	clear_ssh &&
-	git clone "[myhost:123]:src" ssh-bracket-clone &&
-	expect_ssh myhost:123 src
+counter=0
+# $1 url
+# $2 none|host
+# $3 path
+test_fetch_url () {
+	counter=$(($counter + 1))
+	test_might_fail git fetch "$1" tmp$counter &&
+	expect_ssh "$2" "$3"
+}
+
+test_expect_success NOT_MINGW 'fetch c:temp is ssl' '
+	test_fetch_url c:temp c temp
+'
+
+test_expect_success MINGW 'fetch c:temp is dos drive' '
+	test_fetch_url c:temp none
+'
+
+#ip v4
+for repo in rep rep/home/project /~proj 123
+do
+	test_expect_success "fetchinghost:$repo" '
+		test_fetch_url host:$repo host $repo
+	'
+done
+
+#ipv6
+for repo in rep rep/home/project /~proj 123
+do
+	test_expect_success "fetching[::1]:$repo" '
+		test_fetch_url [::1]:$repo ::1 $repo
+	'
+done
+
+# Corner cases
+for url in [foo]bar/baz:qux [foo/bar]:baz foo/bar:baz
+do
+	test_expect_success "fetching$url is not ssh" '
+			test_fetch_url $url none
+	'
+done
+
+#with ssh:// scheme
+test_expect_success 'ssh://host.xz/home/user/repo' '
+	test_fetch_url "ssh://host.xz/home/user/repo" host.xz "/home/user/repo"
+'
+
+# from home directory
+test_expect_success 'ssh://host.xz/~repo' '
+	test_fetch_url "ssh://host.xz/~repo" host.xz "~repo"
+'
+# with port number
+test_expect_success 'ssh://host.xz:22/home/user/repo' '
+	test_fetch_url "ssh://host.xz:22/home/user/repo" "-p 22 host.xz" "/home/user/repo"
+'
+
+# from home directory with port number
+test_expect_success 'ssh://host.xz:22/~repo' '
+	test_fetch_url "ssh://host.xz:22/~repo" "-p 22 host.xz" "~repo"
+'
+
+#IPv6
+test_expect_success 'ssh://[::1]/home/user/repo' '
+	test_fetch_url "ssh://[::1]/home/user/repo" "::1" "/home/user/repo"
+'
+
+#IPv6 from home directory
+test_expect_success 'ssh://[::1]/~repo' '
+	test_fetch_url "ssh://[::1]/~repo" "::1" "~repo"
+'
+
+#IPv6 with port number
+test_expect_success 'ssh://[::1]:22/home/user/repo' '
+	test_fetch_url "ssh://[::1]:22/home/user/repo" "-p 22 ::1" "/home/user/repo"
+'
+#IPv6 from home directory with port number
+test_expect_success 'ssh://[::1]:22/~repo' '
+	test_fetch_url "ssh://[::1]:22/~repo" "-p 22 ::1" "~repo"
 '
 
 test_expect_success 'clone from a repository with two identical branches' '
diff --git a/transport.c b/transport.c
index 7202b77..a09ba95 100644
--- a/transport.c
+++ b/transport.c
@@ -885,14 +885,6 @@ void transport_take_over(struct transport *transport,
 	transport->cannot_reuse = 1;
 }
 
-static int is_local(const char *url)
-{
-	const char *colon = strchr(url, ':');
-	const char *slash = strchr(url, '/');
-	return !colon || (slash && slash < colon) ||
-		has_dos_drive_prefix(url);
-}
-
 static int is_file(const char *url)
 {
 	struct stat buf;
-- 
1.8.4.457.g424cb08

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V4] git clone: is an URL local or ssh
  2013-11-04 21:20 [PATCH V4] git clone: is an URL local or ssh Torsten Bögershausen
@ 2013-11-05  7:14 ` Johannes Sixt
  2013-11-05 19:35   ` [PATCH 1/2] git_connect: remove artificial limit of a remote command Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2013-11-05  7:14 UTC (permalink / raw)
  To: Torsten Bögershausen, git

Am 11/4/2013 22:20, schrieb Torsten Bögershausen:
> Bug fix for msygit in t5601 : use $PWD insted of $(pwd)

Not really. $PWD is /c/foo/bar style, but $(pwd) is c:/foo/bar, which is
equally good.

>  test_expect_success 'clone with excess parameters (2)' '
>  
>  	rm -fr dst &&
> -	test_must_fail git clone -n "file://$(pwd)/src" dst junk
> +	test_must_fail git clone -n "file://$PWD/src" dst junk

That the code change necessitates this change in the test suite is a sign
that there is something wrong with the new code.

And indeed, the original works:

C:\Temp\gittest>git clone -n file://c:/Temp/gittest/foo good
Cloning into 'good'...
remote: Counting objects: 2, done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (2/2), done.
Checking connectivity... done.

but the changed code does not:

C:\Temp\gittest>D:\src\mingw-git\git clone -n file://c:/Temp/gittest/foo bad
Cloning into 'bad'...
fatal: 'D:/Src/MsysGit/Temp/gittest/foo' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


Can you please make this into a series of small patches so that we can
more easily see the good and the bad parts? One of the patches could be a
clean-up of the current protocol determination and URL dissection, which
is indigestible spaghetti right now.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] git_connect: remove artificial limit of a remote command
  2013-11-05  7:14 ` Johannes Sixt
@ 2013-11-05 19:35   ` Johannes Sixt
  2013-11-05 19:39     ` [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2013-11-05 19:35 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Since day one, function git_connect() had a limit on the command line of
the command that is invoked to make a connection. 7a33bcbe converted the
code that constructs the command to strbuf. This would have been the
right time to remove the limit, but it did not happen. Remove it now.

git_connect() uses start_command() to invoke the command; consequently,
the limits of the system still apply, but are diagnosed only at execve()
time. But these limits are more lenient than the 1K that git_connect()
imposed.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Am 05.11.2013 08:14, schrieb Johannes Sixt:
> Can you please make this into a series of small patches so that we can
> more easily see the good and the bad parts? One of the patches could be a
> clean-up of the current protocol determination and URL dissection, which
> is indigestible spaghetti right now.

Maybe start with these two.

 connect.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/connect.c b/connect.c
index 06e88b0..6cc1f8d 100644
--- a/connect.c
+++ b/connect.c
@@ -527,8 +527,6 @@ static struct child_process *git_proxy_connect(int fd[2], char *host)
 	return proxy;
 }
 
-#define MAX_CMD_LEN 1024
-
 static char *get_port(char *host)
 {
 	char *end;
@@ -570,7 +568,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	int free_path = 0;
 	char *port = NULL;
 	const char **arg;
-	struct strbuf cmd;
+	struct strbuf cmd = STRBUF_INIT;
 
 	/* Without this we cannot rely on waitpid() to tell
 	 * what happened to our children.
@@ -676,12 +674,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 
 	conn = xcalloc(1, sizeof(*conn));
 
-	strbuf_init(&cmd, MAX_CMD_LEN);
 	strbuf_addstr(&cmd, prog);
 	strbuf_addch(&cmd, ' ');
 	sq_quote_buf(&cmd, path);
-	if (cmd.len >= MAX_CMD_LEN)
-		die("command line too long");
 
 	conn->in = conn->out = -1;
 	conn->argv = arg = xcalloc(7, sizeof(*arg));
-- 
1.8.4.33.gd68f7e8

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts
  2013-11-05 19:35   ` [PATCH 1/2] git_connect: remove artificial limit of a remote command Johannes Sixt
@ 2013-11-05 19:39     ` Johannes Sixt
  2013-11-05 20:45       ` Torsten Bögershausen
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2013-11-05 19:39 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

git_connect has grown large due to the many different protocols syntaxes
that are supported. Move the part of the function that parses the URL to
connect to into a separate function for readability.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Apart from this "simplification", the protocol parsing code is a complete
mystery to me. There is more potential for simplification because
git_proxy_connect() and git_tcp_connect() do their own host+port parsing.


 connect.c | 80 ++++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 53 insertions(+), 27 deletions(-)

diff --git a/connect.c b/connect.c
index 6cc1f8d..a6cf345 100644
--- a/connect.c
+++ b/connect.c
@@ -543,37 +543,20 @@ static char *get_port(char *host)
 	return NULL;
 }
 
-static struct child_process no_fork;
-
 /*
- * This returns a dummy child_process if the transport protocol does not
- * need fork(2), or a struct child_process object if it does.  Once done,
- * finish the connection with finish_connect() with the value returned from
- * this function (it is safe to call finish_connect() with NULL to support
- * the former case).
- *
- * If it returns, the connect is successful; it just dies on errors (this
- * will hopefully be changed in a libification effort, to return NULL when
- * the connection failed).
+ * Extract protocol and relevant parts from the specified connection URL.
+ * The caller must free() the returned strings.
  */
-struct child_process *git_connect(int fd[2], const char *url_orig,
-				  const char *prog, int flags)
+static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
+				       char **ret_port, char **ret_path)
 {
 	char *url;
 	char *host, *path;
 	char *end;
 	int c;
-	struct child_process *conn = &no_fork;
 	enum protocol protocol = PROTO_LOCAL;
 	int free_path = 0;
 	char *port = NULL;
-	const char **arg;
-	struct strbuf cmd = STRBUF_INIT;
-
-	/* Without this we cannot rely on waitpid() to tell
-	 * what happened to our children.
-	 */
-	signal(SIGCHLD, SIG_DFL);
 
 	if (is_url(url_orig))
 		url = url_decode(url_orig);
@@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	if (protocol == PROTO_SSH && host != url)
 		port = get_port(end);
 
+	*ret_host = xstrdup(host);
+	if (port)
+		*ret_port = xstrdup(port);
+	else
+		*ret_port = NULL;
+	if (free_path)
+		*ret_path = path;
+	else
+		*ret_path = xstrdup(path);
+	free(url);
+	return protocol;
+}
+
+static struct child_process no_fork;
+
+/*
+ * This returns a dummy child_process if the transport protocol does not
+ * need fork(2), or a struct child_process object if it does.  Once done,
+ * finish the connection with finish_connect() with the value returned from
+ * this function (it is safe to call finish_connect() with NULL to support
+ * the former case).
+ *
+ * If it returns, the connect is successful; it just dies on errors (this
+ * will hopefully be changed in a libification effort, to return NULL when
+ * the connection failed).
+ */
+struct child_process *git_connect(int fd[2], const char *url,
+				  const char *prog, int flags)
+{
+	char *host, *path;
+	struct child_process *conn = &no_fork;
+	enum protocol protocol;
+	char *port;
+	const char **arg;
+	struct strbuf cmd = STRBUF_INIT;
+
+	/* Without this we cannot rely on waitpid() to tell
+	 * what happened to our children.
+	 */
+	signal(SIGCHLD, SIG_DFL);
+
+	protocol = parse_connect_url(url, &host, &port, &path);
+
 	if (protocol == PROTO_GIT) {
 		/* These underlying connection commands die() if they
 		 * cannot connect.
@@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 			     prog, path, 0,
 			     target_host, 0);
 		free(target_host);
-		free(url);
-		if (free_path)
-			free(path);
+		free(host);
+		free(port);
+		free(path);
 		return conn;
 	}
 
@@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	fd[0] = conn->out; /* read from child's stdout */
 	fd[1] = conn->in;  /* write to child's stdin */
 	strbuf_release(&cmd);
-	free(url);
-	if (free_path)
-		free(path);
+	free(host);
+	free(port);
+	free(path);
 	return conn;
 }
 
-- 
1.8.4.33.gd68f7e8

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts
  2013-11-05 19:39     ` [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts Johannes Sixt
@ 2013-11-05 20:45       ` Torsten Bögershausen
  2013-11-05 21:22         ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Torsten Bögershausen @ 2013-11-05 20:45 UTC (permalink / raw)
  To: Johannes Sixt, Torsten Bögershausen; +Cc: git

On 2013-11-05 20.39, Johannes Sixt wrote:
Thanks for picking this up, please see some minor nits inline,
and git_connect() is at the end

> -struct child_process *git_connect(int fd[2], const char *url_orig,
> -				  const char *prog, int flags)
> +static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
> +				       char **ret_port, char **ret_path)
>  {
>  	char *url;
>  	char *host, *path;
>  	char *end;
Can we put all the char * into one single line?

>  	int c;
> @@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>  	if (protocol == PROTO_SSH && host != url)
>  		port = get_port(end);
>  
> +	*ret_host = xstrdup(host);
> +	if (port)
> +		*ret_port = xstrdup(port);
> +	else
> +		*ret_port = NULL;
> +	if (free_path)
> +		*ret_path = path;
> +	else
> +		*ret_path = xstrdup(path);
> +	free(url);
> +	return protocol;
> +}
> +
> +static struct child_process no_fork;
> +
> +/*
> + * This returns a dummy child_process if the transport protocol does not
> + * need fork(2), or a struct child_process object if it does.  Once done,
> + * finish the connection with finish_connect() with the value returned from
> + * this function (it is safe to call finish_connect() with NULL to support
> + * the former case).
> + *
> + * If it returns, the connect is successful; it just dies on errors (this
> + * will hopefully be changed in a libification effort, to return NULL when
> + * the connection failed).
> + */
> +struct child_process *git_connect(int fd[2], const char *url,
> +				  const char *prog, int flags)
> +{
> +	char *host, *path;
> +	struct child_process *conn = &no_fork;
> +	enum protocol protocol;
> +	char *port;
> +	const char **arg;
> +	struct strbuf cmd = STRBUF_INIT;
> +
> +	/* Without this we cannot rely on waitpid() to tell
> +	 * what happened to our children.
> +	 */
> +	signal(SIGCHLD, SIG_DFL);
> +
> +	protocol = parse_connect_url(url, &host, &port, &path);
> +
>  	if (protocol == PROTO_GIT) {
>  		/* These underlying connection commands die() if they
>  		 * cannot connect.
> @@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>  			     prog, path, 0,
>  			     target_host, 0);
>  		free(target_host);
This is hard to see in the diff, I think we don't need target_host any more.
> -		free(url);
> -		if (free_path)
> -			free(path);
> +		free(host);
> +		free(port);
> +		free(path);
>  		return conn;
>  	}
>  
> @@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>  	fd[0] = conn->out; /* read from child's stdout */
>  	fd[1] = conn->in;  /* write to child's stdin */
>  	strbuf_release(&cmd);
> -	free(url);
> -	if (free_path)
> -		free(path);

This "end of function, free everything and return conn",
could we re-arange so that it is in the code only once ?

> +	free(host);
> +	free(port);
> +	free(path);
>  	return conn;
>  }
====================

struct child_process *git_connect(int fd[2], const char *url,
                  const char *prog, int flags)
{
    char *host, *port, *path;
    struct child_process *conn = &no_fork;
    enum protocol protocol;
    const char **arg;
    struct strbuf cmd = STRBUF_INIT;

    /* Without this we cannot rely on waitpid() to tell
     * what happened to our children.
     */
    signal(SIGCHLD, SIG_DFL);

    protocol = parse_connect_url(url, &host, &port, &path);

    if (protocol == PROTO_GIT) {
        /* These underlying connection commands die() if they
         * cannot connect.
         */
        if (git_use_proxy(host))
            conn = git_proxy_connect(fd, host);
        else
            git_tcp_connect(fd, host, flags);
        /*
         * Separate original protocol components prog and path
         * from extended host header with a NUL byte.
         *
         * Note: Do not add any other headers here!  Doing so
         * will cause older git-daemon servers to crash.
         */
        packet_write(fd[1],
                 "%s %s%chost=%s%c",
                 prog, path, 0,
                 host, 0);
    }
    else
    {
        conn = xcalloc(1, sizeof(*conn));

        strbuf_addstr(&cmd, prog);
        strbuf_addch(&cmd, ' ');
        sq_quote_buf(&cmd, path);

        conn->in = conn->out = -1;
        conn->argv = arg = xcalloc(7, sizeof(*arg));
        if (protocol == PROTO_SSH) {
            const char *ssh = getenv("GIT_SSH");
            int putty = ssh && strcasestr(ssh, "plink");
            if (!ssh) ssh = "ssh";

            *arg++ = ssh;
            if (putty && !strcasestr(ssh, "tortoiseplink"))
                *arg++ = "-batch";
            if (port) {
                /* P is for PuTTY, p is for OpenSSH */
                *arg++ = putty ? "-P" : "-p";
                *arg++ = port;
            }
            *arg++ = host;
        }
        else {
            /* remove repo-local variables from the environment */
            conn->env = local_repo_env;
            conn->use_shell = 1;
        }
        *arg++ = cmd.buf;
        *arg = NULL;

        if (start_command(conn))
            die("unable to fork");

        fd[0] = conn->out; /* read from child's stdout */
        fd[1] = conn->in;  /* write to child's stdin */
        strbuf_release(&cmd);
    }
    free(host);
    free(port);
    free(path);
    return conn;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts
  2013-11-05 20:45       ` Torsten Bögershausen
@ 2013-11-05 21:22         ` Johannes Sixt
  2013-11-06 15:31           ` Torsten Bögershausen
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2013-11-05 21:22 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Am 05.11.2013 21:45, schrieb Torsten Bögershausen:
> On 2013-11-05 20.39, Johannes Sixt wrote:
> Thanks for picking this up, please see some minor nits inline,
> and git_connect() is at the end
> 
>> -struct child_process *git_connect(int fd[2], const char *url_orig,
>> -				  const char *prog, int flags)
>> +static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
>> +				       char **ret_port, char **ret_path)
>>  {
>>  	char *url;
>>  	char *host, *path;
>>  	char *end;
> Can we put all the char * into one single line?

The idea here was to keep the diff minimal, and that further slight
cleanups should be combined with subsequent rewrites that should happen
to this function.

>>  	int c;
>> @@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>>  	if (protocol == PROTO_SSH && host != url)
>>  		port = get_port(end);
>>  
>> +	*ret_host = xstrdup(host);
>> +	if (port)
>> +		*ret_port = xstrdup(port);
>> +	else
>> +		*ret_port = NULL;
>> +	if (free_path)
>> +		*ret_path = path;
>> +	else
>> +		*ret_path = xstrdup(path);
>> +	free(url);
>> +	return protocol;
>> +}
>> +
>> +static struct child_process no_fork;
>> +
>> +/*
>> + * This returns a dummy child_process if the transport protocol does not
>> + * need fork(2), or a struct child_process object if it does.  Once done,
>> + * finish the connection with finish_connect() with the value returned from
>> + * this function (it is safe to call finish_connect() with NULL to support
>> + * the former case).
>> + *
>> + * If it returns, the connect is successful; it just dies on errors (this
>> + * will hopefully be changed in a libification effort, to return NULL when
>> + * the connection failed).
>> + */
>> +struct child_process *git_connect(int fd[2], const char *url,
>> +				  const char *prog, int flags)
>> +{
>> +	char *host, *path;
>> +	struct child_process *conn = &no_fork;
>> +	enum protocol protocol;
>> +	char *port;
>> +	const char **arg;
>> +	struct strbuf cmd = STRBUF_INIT;
>> +
>> +	/* Without this we cannot rely on waitpid() to tell
>> +	 * what happened to our children.
>> +	 */
>> +	signal(SIGCHLD, SIG_DFL);
>> +
>> +	protocol = parse_connect_url(url, &host, &port, &path);
>> +
>>  	if (protocol == PROTO_GIT) {
>>  		/* These underlying connection commands die() if they
>>  		 * cannot connect.
>> @@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>>  			     prog, path, 0,
>>  			     target_host, 0);
>>  		free(target_host);
> This is hard to see in the diff, I think we don't need target_host any more.

I though that as well first, but no, we still need it. Further rewrites
are needed that move the port discovery from git_proxy_connect() and
git_tcp_connect() to the new parse_connect_url() before target_host can
go away. And even then it is questionable because target_host is used in
an error message and is intended to reflect the original combined
host+port portion of the URL, if I read the code correctly.

>> -		free(url);
>> -		if (free_path)
>> -			free(path);
>> +		free(host);
>> +		free(port);
>> +		free(path);
>>  		return conn;
>>  	}
>>  
>> @@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>>  	fd[0] = conn->out; /* read from child's stdout */
>>  	fd[1] = conn->in;  /* write to child's stdin */
>>  	strbuf_release(&cmd);
>> -	free(url);
>> -	if (free_path)
>> -		free(path);
> 
> This "end of function, free everything and return conn",
> could we re-arange so that it is in the code only once ?

That would be quite simple now; just place the part after the first
return into the else branch. That opens opportunities to move variable
declarations from the top of the function into the else branch.

But all of these changes should go into a separate commit, IMO, so that
the function splitting that happens here can be verified more easily.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts
  2013-11-05 21:22         ` Johannes Sixt
@ 2013-11-06 15:31           ` Torsten Bögershausen
  0 siblings, 0 replies; 9+ messages in thread
From: Torsten Bögershausen @ 2013-11-06 15:31 UTC (permalink / raw)
  To: Johannes Sixt, Torsten Bögershausen; +Cc: git

On 2013-11-05 22.22, Johannes Sixt wrote:
> Am 05.11.2013 21:45, schrieb Torsten Bögershausen:
>> On 2013-11-05 20.39, Johannes Sixt wrote:
>> Thanks for picking this up, please see some minor nits inline,
>> and git_connect() is at the end
>>
>>> -struct child_process *git_connect(int fd[2], const char *url_orig,
>>> -				  const char *prog, int flags)
>>> +static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
>>> +				       char **ret_port, char **ret_path)
>>>  {
>>>  	char *url;
>>>  	char *host, *path;
>>>  	char *end;
>> Can we put all the char * into one single line?
> 
> The idea here was to keep the diff minimal, and that further slight
> cleanups should be combined with subsequent rewrites that should happen
> to this function.
> 
>>>  	int c;
>>> @@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>>>  	if (protocol == PROTO_SSH && host != url)
>>>  		port = get_port(end);
>>>  
>>> +	*ret_host = xstrdup(host);
>>> +	if (port)
>>> +		*ret_port = xstrdup(port);
>>> +	else
>>> +		*ret_port = NULL;
>>> +	if (free_path)
>>> +		*ret_path = path;
>>> +	else
>>> +		*ret_path = xstrdup(path);
>>> +	free(url);
>>> +	return protocol;
>>> +}
>>> +
>>> +static struct child_process no_fork;
>>> +
>>> +/*
>>> + * This returns a dummy child_process if the transport protocol does not
>>> + * need fork(2), or a struct child_process object if it does.  Once done,
>>> + * finish the connection with finish_connect() with the value returned from
>>> + * this function (it is safe to call finish_connect() with NULL to support
>>> + * the former case).
>>> + *
>>> + * If it returns, the connect is successful; it just dies on errors (this
>>> + * will hopefully be changed in a libification effort, to return NULL when
>>> + * the connection failed).
>>> + */
>>> +struct child_process *git_connect(int fd[2], const char *url,
>>> +				  const char *prog, int flags)
>>> +{
>>> +	char *host, *path;
>>> +	struct child_process *conn = &no_fork;
>>> +	enum protocol protocol;
>>> +	char *port;
>>> +	const char **arg;
>>> +	struct strbuf cmd = STRBUF_INIT;
>>> +
>>> +	/* Without this we cannot rely on waitpid() to tell
>>> +	 * what happened to our children.
>>> +	 */
>>> +	signal(SIGCHLD, SIG_DFL);
>>> +
>>> +	protocol = parse_connect_url(url, &host, &port, &path);
>>> +
>>>  	if (protocol == PROTO_GIT) {
>>>  		/* These underlying connection commands die() if they
>>>  		 * cannot connect.
>>> @@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>>>  			     prog, path, 0,
>>>  			     target_host, 0);
>>>  		free(target_host);
>> This is hard to see in the diff, I think we don't need target_host any more.
> 
> I though that as well first, but no, we still need it. Further rewrites
> are needed that move the port discovery from git_proxy_connect() and
> git_tcp_connect() to the new parse_connect_url() before target_host can
> go away. And even then it is questionable because target_host is used in
> an error message and is intended to reflect the original combined
> host+port portion of the URL, if I read the code correctly.
> 
>>> -		free(url);
>>> -		if (free_path)
>>> -			free(path);
>>> +		free(host);
>>> +		free(port);
>>> +		free(path);
>>>  		return conn;
>>>  	}
>>>  
>>> @@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
>>>  	fd[0] = conn->out; /* read from child's stdout */
>>>  	fd[1] = conn->in;  /* write to child's stdin */
>>>  	strbuf_release(&cmd);
>>> -	free(url);
>>> -	if (free_path)
>>> -		free(path);
>>
>> This "end of function, free everything and return conn",
>> could we re-arange so that it is in the code only once ?
> 
> That would be quite simple now; just place the part after the first
> return into the else branch. That opens opportunities to move variable
> declarations from the top of the function into the else branch.
> 
> But all of these changes should go into a separate commit, IMO, so that
> the function splitting that happens here can be verified more easily.
> 
> -- Hannes
Agreed on all points, (some re-reading was needed)

I will first focus on the test cases,
since having god test cases eases us the re-factoring later on.
Thanks
/Torsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-11-06 15:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-04 21:20 [PATCH V4] git clone: is an URL local or ssh Torsten Bögershausen
2013-11-05  7:14 ` Johannes Sixt
2013-11-05 19:35   ` [PATCH 1/2] git_connect: remove artificial limit of a remote command Johannes Sixt
2013-11-05 19:39     ` [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts Johannes Sixt
2013-11-05 20:45       ` Torsten Bögershausen
2013-11-05 21:22         ` Johannes Sixt
2013-11-06 15:31           ` Torsten Bögershausen
  -- strict thread matches above, loose matches on Subject: below --
2013-11-04 21:18 [PATCH V4] git clone: is an URL local or ssh Torsten Bögershausen
2013-11-04 21:04 Torsten Bögershausen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).