Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v2 0/4] urlmatch: allow regexp-based matches
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20170123130635.29577-1-patrick.steinhardt@elego.de>

Hi,

This is version two of my patch series.

The use case is to be able to configure an HTTP proxy for all
subdomains of a domain where there are hundreds of subdomains.

Previously, I have been using complete regular expressions with
an escape-mechanism to match the configuration key's URLs.
According to Junio's comments, I changed this mechanism to a much
simpler one, where the user is only allowed to use globbing for
the host part of the URL. That is a user can now specify a key
`http.https://*.example.com` to match all sub-domains of
`example.com`. For now I've decided to implement it such that a
single `*` matches a single subdomain only, so for example
`https://foo.bar.example.com` would not match in this case. This
is similar to how shell-globbing works usually, so it should not
be of much surprise. It's also highlighted in the documentation.

I did not include an interdiff as too much has changed between
the two versions.

Regards
Patrick

Patrick Steinhardt (4):
  mailmap: add Patrick Steinhardt's work address
  urlmatch: enable normalization of URLs with globs
  urlmatch: split host and port fields in `struct url_info`
  urlmatch: allow globbing for the URL host part

 .mailmap                 |  1 +
 Documentation/config.txt |  5 +++-
 t/t1300-repo-config.sh   | 36 +++++++++++++++++++++++++
 urlmatch.c               | 68 +++++++++++++++++++++++++++++++++++++++++-------
 urlmatch.h               |  9 ++++---
 5 files changed, 104 insertions(+), 15 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH v2 1/4] mailmap: add Patrick Steinhardt's work address
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20170123130635.29577-1-patrick.steinhardt@elego.de>

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 9c87a3840..ea59205b9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -177,6 +177,7 @@ Paolo Bonzini <bonzini@gnu.org> <paolo.bonzini@lu.unisi.ch>
 Pascal Obry <pascal@obry.net> <pascal.obry@gmail.com>
 Pascal Obry <pascal@obry.net> <pascal.obry@wanadoo.fr>
 Pat Notz <patnotz@gmail.com> <pknotz@sandia.gov>
+Patrick Steinhardt <ps@pks.im> <patrick.steinhardt@elego.de>
 Paul Mackerras <paulus@samba.org> <paulus@dorrigo.(none)>
 Paul Mackerras <paulus@samba.org> <paulus@pogo.(none)>
 Peter Baumann <waste.manager@gmx.de> <Peter.B.Baumann@stud.informatik.uni-erlangen.de>
-- 
2.11.0


^ permalink raw reply related

* [PATCH v2 2/4] urlmatch: enable normalization of URLs with globs
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20170123130635.29577-1-patrick.steinhardt@elego.de>

The `url_normalize` function is used to validate and normalize URLs. As
such, it does not allow for some special characters to be part of the
URLs that are to be normalized. As we want to allow using globs in some
configuration keys making use of URLs, namely `http.<url>.<key>`, but
still normalize them, we need to somehow enable some additional allowed
characters.

To do this without having to change all callers of `url_normalize`,
where most do not actually want globbing at all, we split off another
function `url_normalize_1`. This function accepts an additional
parameter `allow_globs`, which is subsequently called by `url_normalize`
with `allow_globs=0`.

As of now, this function is not used with globbing enabled. A caller
will be added in the following commit.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index 132d342bc..d350478c0 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,7 +63,7 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }

-char *url_normalize(const char *url, struct url_info *out_info)
+static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
 	 * Normalize NUL-terminated url using the following rules:
@@ -191,7 +191,12 @@ char *url_normalize(const char *url, struct url_info *out_info)
 		strbuf_release(&norm);
 		return NULL;
 	}
-	spanned = strspn(url, URL_HOST_CHARS);
+
+	if (allow_globs)
+		spanned = strspn(url, URL_HOST_CHARS "*");
+	else
+		spanned = strspn(url, URL_HOST_CHARS);
+
 	if (spanned < colon_ptr - url) {
 		/* Host name has invalid characters */
 		if (out_info) {
@@ -380,6 +385,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return result;
 }

+char *url_normalize(const char *url, struct url_info *out_info)
+{
+	return url_normalize_1(url, out_info, 0);
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 3/4] urlmatch: split host and port fields in `struct url_info`
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20170123130635.29577-1-patrick.steinhardt@elego.de>

The `url_info` structure contains information about a normalized URL
with the URL's components being represented by different fields. The
host and port part though are to be accessed by the same `host` field,
so that getting the host and/or port separately becomes more involved
than really necessary.

To make the port more readily accessible, split up the host and port
fields. Namely, the `host_len` will not include the port length anymore
and a new `port_off` field has been added which includes the offset to
the port, if available.

The only user of these fields is `url_normalize_1`. This change makes it
easier later on to treat host and port differently when introducing
globs for domains.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 16 ++++++++++++----
 urlmatch.h |  9 +++++----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index d350478c0..e328905eb 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -104,7 +104,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	struct strbuf norm;
 	size_t spanned;
 	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
-	size_t host_off=0, host_len=0, port_len=0, path_off, path_len, result_len;
+	size_t host_off=0, host_len=0, port_off=0, port_len=0, path_off, path_len, result_len;
 	const char *slash_ptr, *at_ptr, *colon_ptr, *path_start;
 	char *result;
 
@@ -263,6 +263,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 				return NULL;
 			}
 			strbuf_addch(&norm, ':');
+			port_off = norm.len;
 			strbuf_add(&norm, url, slash_ptr - url);
 			port_len = slash_ptr - url;
 		}
@@ -270,7 +271,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		url = slash_ptr;
 	}
 	if (host_off)
-		host_len = norm.len - host_off;
+		host_len = norm.len - host_off - (port_len ? port_len + 1 : 0);
 
 
 	/*
@@ -378,6 +379,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		out_info->passwd_len = passwd_len;
 		out_info->host_off = host_off;
 		out_info->host_len = host_len;
+		out_info->port_off = port_off;
 		out_info->port_len = port_len;
 		out_info->path_off = path_off;
 		out_info->path_len = path_len;
@@ -464,11 +466,17 @@ static int match_urls(const struct url_info *url,
 		usermatched = 1;
 	}
 
-	/* check the host and port */
+	/* check the host */
 	if (url_prefix->host_len != url->host_len ||
 	    strncmp(url->url + url->host_off,
 		    url_prefix->url + url_prefix->host_off, url->host_len))
-		return 0; /* host names and/or ports do not match */
+		return 0; /* host names do not match */
+
+	/* check the port */
+	if (url_prefix->port_len != url->port_len ||
+	    strncmp(url->url + url->port_off,
+		    url_prefix->url + url_prefix->port_off, url->port_len))
+		return 0; /* ports do not match */
 
 	/* check the path */
 	pathmatchlen = url_match_prefix(
diff --git a/urlmatch.h b/urlmatch.h
index 528862adc..0ea812b03 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -18,11 +18,12 @@ struct url_info {
 	size_t passwd_len;	/* length of passwd; if passwd_off != 0 but
 				   passwd_len == 0, an empty passwd was given */
 	size_t host_off;	/* offset into url to start of host name (0 => none) */
-	size_t host_len;	/* length of host name; this INCLUDES any ':portnum';
+	size_t host_len;	/* length of host name;
 				 * file urls may have host_len == 0 */
-	size_t port_len;	/* if a portnum is present (port_len != 0), it has
-				 * this length (excluding the leading ':') at the
-				 * end of the host name (always 0 for file urls) */
+	size_t port_off;	/* offset into url to start of port number (0 => none) */
+	size_t port_len;	/* if a portnum is present (port_off != 0), it has
+				 * this length (excluding the leading ':') starting
+				 * from port_off (always 0 for file urls) */
 	size_t path_off;	/* offset into url to the start of the url path;
 				 * this will always point to a '/' character
 				 * after the url has been normalized */
-- 
2.11.0


^ permalink raw reply related

* [PATCH v2 4/4] urlmatch: allow globbing for the URL host part
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20170123130635.29577-1-patrick.steinhardt@elego.de>

The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

This commit introduces the ability to use globbing in the host-part of
the URLs. A user can simply specify a `*` as part of the host name to
match all subdomains at this level. For example adding a configuration
key `http.https://*.example.com.proxy` will match all subdomains of
`https://example.com`.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 Documentation/config.txt |  5 ++++-
 t/t1300-repo-config.sh   | 36 ++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 38 ++++++++++++++++++++++++++++++++++----
 3 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 506431267..a78921c2b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1914,7 +1914,10 @@ http.<url>.*::
   must match exactly between the config key and the URL.
 
 . Host/domain name (e.g., `example.com` in `https://example.com/`).
-  This field must match exactly between the config key and the URL.
+  This field must match between the config key and the URL. It is
+  possible to use globs in the config key to match all subdomains, e.g.
+  `https://*.example.com/` to match all subdomains of `example.com`. Note
+  that a glob only every matches a single part of the hostname.
 
 . Port number (e.g., `8080` in `http://example.com:8080/`).
   This field must match exactly between the config key and the URL.
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a2..ec545e092 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'glob-based urlmatch' '
+	cat >.git/config <<-\EOF &&
+	[http]
+		sslVerify
+	[http "https://*.example.com"]
+		sslVerify = false
+		cookieFile = /tmp/cookie.txt
+	EOF
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good.example.com >actual &&
+	test_must_be_empty actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://good-example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.sslverify https://deep.nested.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo false >expect &&
+	git config --bool --get-urlmatch http.sslverify https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	{
+		echo http.cookiefile /tmp/cookie.txt &&
+		echo http.sslverify false
+	} >expect &&
+	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual
+'
+
 # good section hygiene
 test_expect_failure 'unsetting the last key in a section removes header' '
 	cat >.git/config <<-\EOF &&
diff --git a/urlmatch.c b/urlmatch.c
index e328905eb..53ff972a6 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,6 +63,38 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static int match_host(const struct url_info *url_info,
+		      const struct url_info *pattern_info)
+{
+	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
+	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
+	char *url_tok, *pat_tok, *url_save, *pat_save;
+	int matching;
+
+	url_tok = strtok_r(url, ".", &url_save);
+	pat_tok = strtok_r(pat, ".", &pat_save);
+
+	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
+				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
+		if (!strcmp(pat_tok, "*"))
+			continue; /* a simple glob matches everything */
+
+		if (strcmp(url_tok, pat_tok)) {
+			/* subdomains do not match */
+			matching = 0;
+			break;
+		}
+	}
+
+	/* matching if both URL and pattern are at their ends */
+	matching = (url_tok == NULL && pat_tok == NULL);
+
+	free(url);
+	free(pat);
+
+	return matching;
+}
+
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
@@ -467,9 +499,7 @@ static int match_urls(const struct url_info *url,
 	}
 
 	/* check the host */
-	if (url_prefix->host_len != url->host_len ||
-	    strncmp(url->url + url->host_off,
-		    url_prefix->url + url_prefix->host_off, url->host_len))
+	if (!match_host(url, url_prefix))
 		return 0; /* host names do not match */
 
 	/* check the port */
@@ -512,7 +542,7 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		struct url_info norm_info;
 
 		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize(config_url, &norm_info);
+		norm_url = url_normalize_1(config_url, &norm_info, 1);
 		free(config_url);
 		if (!norm_url)
 			return 0;
-- 
2.11.0


^ permalink raw reply related

* Re: [RFC 2/2] grep: use '/' delimiter for paths
From: Phil Hord @ 2017-01-24 17:18 UTC (permalink / raw)
  To: Stefan Hajnoczi, Junio C Hamano; +Cc: Jeff King, Git
In-Reply-To: <20170123132918.GM29186@stefanha-x1.localdomain>

On Tue, Jan 24, 2017 at 1:54 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > The use of "git show" you are demonstrating is still about showing
> > the commit object, whose behaviour is defined to show the log
> > message and the diff relative to its sole parent, limited to the
> > paths that match the pathspec.
> >
> > It is perfectly fine and desirable that "git show <commit>:<path>"
> > and "git show <commit> -- <path>" behaves differently.  These are
> > two completely different features.
>
> Thanks for explaining guys.  It all makes logical sense.  I just hope I
> remember the distinctions in that table for everyday git use.


Familiar itch; previous discussion here:

https://public-inbox.org/git/1377528372-31206-1-git-send-email-hordp@cisco.com/

Phil

^ permalink raw reply

* diff --color-words breaks spacing
From: Phil Hord @ 2017-01-24 17:39 UTC (permalink / raw)
  To: Git

I noticed some weird spacing when comparing files with git diff
--color-words.  The space before a colored word disappears sometimes.

$ git --version
git version 2.11.0.485.g4e59582ff

echo "FOO foo; foo = bar" > a
echo "FOO foo = baz" > b
git diff --color-words --no-index a b
FOOfoo; foo = barbaz

There should be a space after FOO in the diff, even if git doesn't
think "foo" and "foo;" are the same word.

If I remove the semicolon, it looks better, but in fact it only moves
the error later. The missing space is now between the two "foo" words.

echo "FOO foo foo = bar" > a
echo "FOO foo = baz" > b
git diff --color-words --no-index a b
FOO foofoo = barbaz

Here's the same with the color codes changed to text for purposes of this email:

echo "FOO foo; foo = bar" > a
echo "FOO foo = baz" > b
git diff --color-words --no-index a b | tr  \\033 E
FOOE[31mfoo;E[m foo = E[31mbarE[mE[32mbazE[m

echo "FOO foo foo = bar" > a
echo "FOO foo = baz" > b
git diff --color-words --no-index a b | tr  \\033 E
FOO fooE[31mfooE[m = E[31mbarE[mE[32mbazE[m

^ permalink raw reply

* Re: [PATCH v2 4/4] urlmatch: allow globbing for the URL host part
From: Philip Oakley @ 2017-01-24 17:52 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20170124170031.18069-5-patrick.steinhardt@elego.de>

From: "Patrick Steinhardt" <patrick.steinhardt@elego.de>

a quick comment on the documentation part ..

> The URL matching function computes for two URLs whether they match not.
> The match is performed by splitting up the URL into different parts and
> then doing an exact comparison with the to-be-matched URL.
>
> The main user of `urlmatch` is the configuration subsystem. It allows to
> set certain configurations based on the URL which is being connected to
> via keys like `http.<url>.*`. A common use case for this is to set
> proxies for only some remotes which match the given URL. Unfortunately,
> having exact matches for all parts of the URL can become quite tedious
> in some setups. Imagine for example a corporate network where there are
> dozens or even hundreds of subdomains, which would have to be configured
> individually.
>
> This commit introduces the ability to use globbing in the host-part of
> the URLs. A user can simply specify a `*` as part of the host name to
> match all subdomains at this level. For example adding a configuration
> key `http.https://*.example.com.proxy` will match all subdomains of
> `https://example.com`.
>
> Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
> ---
> Documentation/config.txt |  5 ++++-
> t/t1300-repo-config.sh   | 36 ++++++++++++++++++++++++++++++++++++
> urlmatch.c               | 38 ++++++++++++++++++++++++++++++++++----
> 3 files changed, 74 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 506431267..a78921c2b 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1914,7 +1914,10 @@ http.<url>.*::
>   must match exactly between the config key and the URL.
>
> . Host/domain name (e.g., `example.com` in `https://example.com/`).
> -  This field must match exactly between the config key and the URL.
> +  This field must match between the config key and the URL. It is
> +  possible to use globs in the config key to match all subdomains, e.g.
> +  `https://*.example.com/` to match all subdomains of `example.com`. Note
> +  that a glob only every matches a single part of the hostname.

[s/every/ever/ ?]

the "match all subdomains" appears to contradict the "a glob only ever 
matches a single part ".

Maybe borrow the example from the 0/4 cover letter
"so for example `https://foo.bar.example.com` would not match in the case of 
`http.https://*.example.com` " (If I understood it correctly.

A simple example often clarifies much better than more words.
--
Philip

>
> . Port number (e.g., `8080` in `http://example.com:8080/`).
>   This field must match exactly between the config key and the URL.
> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index 923bfc5a2..ec545e092 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
>  test_cmp expect actual
> '
>
> +test_expect_success 'glob-based urlmatch' '
> + cat >.git/config <<-\EOF &&
> + [http]
> + sslVerify
> + [http "https://*.example.com"]
> + sslVerify = false
> + cookieFile = /tmp/cookie.txt
> + EOF
> +
> + test_expect_code 1 git config --bool --get-urlmatch doesnt.exist 
> https://good.example.com >actual &&
> + test_must_be_empty actual &&
> +
> + echo true >expect &&
> + git config --bool --get-urlmatch http.SSLverify https://example.com 
>  >actual &&
> + test_cmp expect actual &&
> +
> + echo true >expect &&
> + git config --bool --get-urlmatch http.SSLverify https://good-example.com 
>  >actual &&
> + test_cmp expect actual &&
> +
> + echo true >expect &&
> + git config --bool --get-urlmatch http.sslverify 
> https://deep.nested.example.com >actual &&
> + test_cmp expect actual &&
> +
> + echo false >expect &&
> + git config --bool --get-urlmatch http.sslverify https://good.example.com 
>  >actual &&
> + test_cmp expect actual &&
> +
> + {
> + echo http.cookiefile /tmp/cookie.txt &&
> + echo http.sslverify false
> + } >expect &&
> + git config --get-urlmatch HTTP https://good.example.com >actual &&
> + test_cmp expect actual
> +'
> +
> # good section hygiene
> test_expect_failure 'unsetting the last key in a section removes header' '
>  cat >.git/config <<-\EOF &&
> diff --git a/urlmatch.c b/urlmatch.c
> index e328905eb..53ff972a6 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -63,6 +63,38 @@ static int append_normalized_escapes(struct strbuf 
> *buf,
>  return 1;
> }
>
> +static int match_host(const struct url_info *url_info,
> +       const struct url_info *pattern_info)
> +{
> + char *url = xmemdupz(url_info->url + url_info->host_off, 
> url_info->host_len);
> + char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, 
> pattern_info->host_len);
> + char *url_tok, *pat_tok, *url_save, *pat_save;
> + int matching;
> +
> + url_tok = strtok_r(url, ".", &url_save);
> + pat_tok = strtok_r(pat, ".", &pat_save);
> +
> + for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
> +    pat_tok = strtok_r(NULL, ".", &pat_save)) {
> + if (!strcmp(pat_tok, "*"))
> + continue; /* a simple glob matches everything */
> +
> + if (strcmp(url_tok, pat_tok)) {
> + /* subdomains do not match */
> + matching = 0;
> + break;
> + }
> + }
> +
> + /* matching if both URL and pattern are at their ends */
> + matching = (url_tok == NULL && pat_tok == NULL);
> +
> + free(url);
> + free(pat);
> +
> + return matching;
> +}
> +
> static char *url_normalize_1(const char *url, struct url_info *out_info, 
> char allow_globs)
> {
>  /*
> @@ -467,9 +499,7 @@ static int match_urls(const struct url_info *url,
>  }
>
>  /* check the host */
> - if (url_prefix->host_len != url->host_len ||
> -     strncmp(url->url + url->host_off,
> -     url_prefix->url + url_prefix->host_off, url->host_len))
> + if (!match_host(url, url_prefix))
>  return 0; /* host names do not match */
>
>  /* check the port */
> @@ -512,7 +542,7 @@ int urlmatch_config_entry(const char *var, const char 
> *value, void *cb)
>  struct url_info norm_info;
>
>  config_url = xmemdupz(key, dot - key);
> - norm_url = url_normalize(config_url, &norm_info);
> + norm_url = url_normalize_1(config_url, &norm_info, 1);
>  free(config_url);
>  if (!norm_url)
>  return 0;
> -- 
> 2.11.0
>
> 


^ permalink raw reply

* Re: [PATCH v2 0/5] string-list: make string_list_sort() reentrant
From: René Scharfe @ 2017-01-24 18:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano, Johannes Schindelin
In-Reply-To: <20170123235445.qsejumltutd2vrhd@sigill.intra.peff.net>

Am 24.01.2017 um 00:54 schrieb Jeff King:
> The speed looks like a reasonable outcome. I'm torn on the qsort_r()
> demo patch. I don't think it looks too bad. OTOH, I don't think I would
> want to deal with the opposite-argument-order versions.

The code itself may look OK, but it's not really necessary and the 
special implementation for Linux makes increases maintenance costs.  Can 
we save it for later and first give the common implemention a chance to 
prove itself?

> Is there any interest in people adding the ISO qsort_s() to their libc
> implementations? It seems like it's been a fair number of years by now.

https://sourceware.org/ml/libc-alpha/2014-12/msg00513.html is the last 
post mentioning qsort_s on the glibc mailing list, but it didn't even 
make it into https://sourceware.org/glibc/wiki/Development_Todo/Master.
Not sure what's planned in BSD land, didn't find anything (but didn't 
look too hard).

René

^ permalink raw reply

* Re: [DEMO][PATCH v2 6/5] compat: add a qsort_s() implementation based on GNU's qsort_r(1)
From: René Scharfe @ 2017-01-24 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jeff King, Johannes Schindelin
In-Reply-To: <xmqq60l5sihz.fsf@gitster.mtv.corp.google.com>

Am 23.01.2017 um 20:07 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
> 
>> Implement qsort_s() as a wrapper to the GNU version of qsort_r(1) and
>> use it on Linux.  Performance increases slightly:
>>
>> Test                         HEAD^             HEAD
>> --------------------------------------------------------------------
>> 0071.2: sort(1)              0.10(0.20+0.02)   0.10(0.21+0.01) +0.0%
>> 0071.3: string_list_sort()   0.17(0.15+0.01)   0.16(0.15+0.01) -5.9%
>>
>> Additionally the unstripped size of compat/qsort_s.o falls from 24576
>> to 16544 bytes in my build.
>>
>> IMHO these savings aren't worth the increased complexity of having to
>> support two implementations.
> 
> I do worry about having to support more implementations in the
> future that have different function signature for the comparison
> callbacks, which will make things ugly, but this addition alone
> doesn't look too bad to me.

It is unfair of me to show a 5% speedup and then recommend to not
include it. ;-)  That difference won't be measurable in real use cases
and the patch is not necessary.  This patch is simple, but the overall
complexity (incl. #ifdefs etc.) will be lower without it.

But here's another one, with even higher performance and with an even
bigger recommendation to not include it! :)  It veers off into another
direction: Parallel execution.  It requires thread-safe comparison
functions, which might surprise callers.  The value 1000 for the minimum
number of items before threading kicks in is just a guess, not based on
measurements.  So it's quite raw -- and I'm not sure why it's still a
bit slower than sort(1).

Test                         HEAD^             HEAD
---------------------------------------------------------------------
0071.2: sort(1)              0.10(0.18+0.03)   0.10(0.20+0.02) +0.0%
0071.3: string_list_sort()   0.17(0.14+0.02)   0.11(0.18+0.02) -35.3%

---
 compat/qsort_s.c | 76 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 58 insertions(+), 18 deletions(-)

diff --git a/compat/qsort_s.c b/compat/qsort_s.c
index 52d1f0a73d..1304a089af 100644
--- a/compat/qsort_s.c
+++ b/compat/qsort_s.c
@@ -1,4 +1,5 @@
 #include "../git-compat-util.h"
+#include "../thread-utils.h"
 
 /*
  * A merge sort implementation, simplified from the qsort implementation
@@ -6,29 +7,58 @@
  * Added context pointer, safety checks and return value.
  */
 
-static void msort_with_tmp(void *b, size_t n, size_t s,
-			   int (*cmp)(const void *, const void *, void *),
-			   char *t, void *ctx)
+#define MIN_ITEMS_FOR_THREAD 1000
+
+struct work {
+	int nr_threads;
+	void *base;
+	size_t nmemb;
+	size_t size;
+	char *tmp;
+	int (*cmp)(const void *, const void *, void *);
+	void *ctx;
+};
+
+static void *msort_with_tmp(void *work)
 {
+	struct work one, two, *w = work;
 	char *tmp;
 	char *b1, *b2;
 	size_t n1, n2;
+	size_t s, n;
 
-	if (n <= 1)
-		return;
+	if (w->nmemb <= 1)
+		return NULL;
 
-	n1 = n / 2;
-	n2 = n - n1;
-	b1 = b;
-	b2 = (char *)b + (n1 * s);
+	one = two = *w;
+	one.nr_threads /= 2;
+	two.nr_threads -= one.nr_threads;
+	n = one.nmemb;
+	s = one.size;
+	n1 = one.nmemb = n / 2;
+	n2 = two.nmemb = n - n1;
+	b1 = one.base;
+	b2 = two.base = b1 + n1 * s;
+	two.tmp += n1 * s;
 
-	msort_with_tmp(b1, n1, s, cmp, t, ctx);
-	msort_with_tmp(b2, n2, s, cmp, t, ctx);
+#ifndef NO_PTHREADS
+	if (one.nr_threads && n > MIN_ITEMS_FOR_THREAD) {
+		pthread_t thread;
+		int err = pthread_create(&thread, NULL, msort_with_tmp, &one);
+		msort_with_tmp(&two);
+		if (err || pthread_join(thread, NULL))
+			msort_with_tmp(&one);
+	} else
+#endif
+	{
+		msort_with_tmp(&one);
+		msort_with_tmp(&two);
+	}
 
-	tmp = t;
+	tmp = one.tmp;
 
 	while (n1 > 0 && n2 > 0) {
-		if (cmp(b1, b2, ctx) <= 0) {
+		if (one.cmp(b1, b2, one.ctx) <= 0) {
 			memcpy(tmp, b1, s);
 			tmp += s;
 			b1 += s;
@@ -42,7 +72,8 @@ static void msort_with_tmp(void *b, size_t n, size_t s,
 	}
 	if (n1 > 0)
 		memcpy(tmp, b1, n1 * s);
-	memcpy(b, t, (n - n2) * s);
+	memcpy(one.base, one.tmp, (n - n2) * s);
+	return NULL;
 }
 
 int git_qsort_s(void *b, size_t n, size_t s,
@@ -50,20 +81,29 @@ int git_qsort_s(void *b, size_t n, size_t s,
 {
 	const size_t size = st_mult(n, s);
 	char buf[1024];
+	struct work w;
 
 	if (!n)
 		return 0;
 	if (!b || !cmp)
 		return -1;
 
+	w.nr_threads = online_cpus();
+	w.base = b;
+	w.nmemb = n;
+	w.size = s;
+	w.cmp = cmp;
+	w.ctx = ctx;
+
 	if (size < sizeof(buf)) {
 		/* The temporary array fits on the small on-stack buffer. */
-		msort_with_tmp(b, n, s, cmp, buf, ctx);
+		w.tmp = buf;
 	} else {
 		/* It's somewhat large, so malloc it.  */
-		char *tmp = xmalloc(size);
-		msort_with_tmp(b, n, s, cmp, tmp, ctx);
-		free(tmp);
+		w.tmp = xmalloc(size);
 	}
+	msort_with_tmp(&w);
+	if (w.tmp != buf)
+		free(w.tmp);
 	return 0;
 }
-- 
2.11.0


^ permalink raw reply related

* Re: submodule network operations [WAS: Re: [RFC/PATCH 0/4] working tree operations: support superprefix]
From: Stefan Beller @ 2017-01-24 18:49 UTC (permalink / raw)
  To: Brian J. Davis; +Cc: Brandon Williams, git@vger.kernel.org, David Turner
In-Reply-To: <04fe8035-dbf0-83d2-c465-f746b99ce609@gmail.com>

On Sat, Jan 21, 2017 at 7:53 AM, Brian J. Davis <bitminer@gmail.com> wrote:
>
> On 1/19/2017 7:22 PM, Stefan Beller wrote:
>>>>
>>>> Between the init and the update step you can modify the URLs.
>>>> These commands are just a repetition from the first email, but the
>>>> git commands can be viewed as moving from one state to another
>>>> for submodules; submodules itself can be seen as a state machine
>>>> according to that proposed documentation. Maybe such a state machine
>>>> makes it easier to understand for some people.
>>>
>>>
>>> "Between the init and the update step you can modify the URLs."  Yes I
>>> can
>>> and have to... wish it was not this way.
>>
>> So how would yo u rather want to do it?
>> look at the .gitmodules file beforehand and then run a "submodule update"
>> ?
>> Or a thing like
>>
>>      git -c url.https://internal.insteadOf git://github.com/ \
>>          -c submodule.record-rewritten-urls submodule update
>>
>> (no need for init there as theoretically there is not
>> need for such an intermediate step)
>>
> "Yes please and thank you" ... both.
>
> My thought was to simply allow addition to .gitmodules.  If I understand
> correctly you are proposing, to override these at the command line and
> possibly rewrite them on submodule update, but maybe not save or add to
> .gitmodules. I would then propose both.
> 1) allow user to add to .gitmodules for those who do not care that
> "outsiders" know the internal dev server
> and
> 2) allow to rewrite while not saving to .gitmodules on fresh clone and
> submodule update for thoes that do not want ousiders to known internal dev
> server.
> and possibly:
> 3) allow at command line to add remote to .gitmodules on submodule commands
> (note add optoin in -c <name> = <value> pair)
>
> .gitmodules before:
>
> [submodule "subprojects/wrangler"]
>         path = subprojects/wrangler
>         url = git://github.com/
>
> Then your adapted command:
>
> git -c url.https://internal.insteadOf git://github.com/ \
>         -c submodule.record-rewritten-urls=add,internal --add submodule
> update
>
> would produce
>
> [submodule "subprojects/projname"]
>         path = subprojects/projname
>         remote.origin.url = git://github.com/
>         remote.internal.url =https://internal.insteadOf
>
> Or similar support.

I think this was avoided until now as it would rewrite/add history.
So what if you want ot "just mirror" a large project with a lot
of submodules? You would want to do that without touching
the history, hence we'd need to do such a configuration in a separate
place. IIRC there was a proposal to have a ref e.g.
"refs/submodule/config", that can overwrite/extend the .gitmodules
file with your own configuration. It is a ref, such that mirroring would
work, but not part of the main history, such that yoiu can still change it.

I think to get it right we need to enable a workflow that allows easy
"multi-step" mirroring, e.g. A (source of truth) can be mirrored to B,
who can overlay the .gitmodules to point to server-B, which then can
be mirrored by C, who can have its own serverC.

When C forgot to configure all the submodules, it should fall back to
serverB as that was closest, and if that is unconfigured it should
fallback to A IMO.

>
>>>> [remote "origin"]
>>>>     url = https://github.com/..
>>>> [remote "inhouse"]
>>>>     url = https://inhouse.corp/..
>>>>
>>>> But where do we clone it from?
>>>> (Or do we just do a "git init" on that submodule and fetch
>>>> from both remotes? in which order?)
>>>
>>> origin by default and inhouse if specified. There is already a implied
>>> default (origin). The idea was not to do both but rather what is
>>> specified.
>>> Origin and inhouse are just names for remotes. If one wanted a
>>> "--all-remotes" could pull from everywhere in the Ether if feature was to
>>> be
>>> implemented.
>>
>> How is origin implied to be the default?
>> Should there be an order (e.g. if you cannot find it at inhouse get it
>> from github,
>> if they are down, get it from kernel.org)
>
> As it is in the Highlander series... "there can be only one" (remote).   So
> that is what I mean by origin.  The only remote allowed is the "origin"
> unless changed by the user... but there can still only be one *currently*.
> Though I see your point as it is not labeled "origin".  It is not labeled at
> all.  Apologies for confusion there.

"origin" is just a common name for a remote, like "master" is a common name
for branches. There is nothing inherently special about these except for
their automatic setup after cloning/initializing a repo.

So you can delete the master branch (which I did in a project
that I am not the authoritative maintainer of; I only have feature
branches), and
the repository just works fine.

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #03; Thu, 19)
From: Thomas Braun @ 2017-01-24 18:56 UTC (permalink / raw)
  To: git, Junio C Hamano
In-Reply-To: <xmqqtw8uy62m.fsf@gitster.mtv.corp.google.com>

> Junio C Hamano <gitster@pobox.com> hat am 20. Januar 2017 um 00:37
> geschrieben:

[snip]
  
> * rh/mergetool-regression-fix (2017-01-10) 14 commits
>   (merged to 'next' on 2017-01-10 at e8e00c798b)
>  + mergetool: fix running in subdir when rerere enabled
>  + mergetool: take the "-O" out of $orderfile
>  + t7610: add test case for rerere+mergetool+subdir bug
>  + t7610: spell 'git reset --hard' consistently
>  + t7610: don't assume the checked-out commit
>  + t7610: always work on a test-specific branch
>  + t7610: delete some now-unnecessary 'git reset --hard' lines
>  + t7610: run 'git reset --hard' after each test to clean up
>  + t7610: don't rely on state from previous test
>  + t7610: use test_when_finished for cleanup tasks
>  + t7610: move setup code to the 'setup' test case
>  + t7610: update branch names to match test number
>  + rev-parse doc: pass "--" to rev-parse in the --prefix example
>  + .mailmap: record canonical email for Richard Hansen
> 
>  "git mergetool" without any pathspec on the command line that is
>  run from a subdirectory became no-op in Git v2.11 by mistake, which
>  has been fixed.

Hi Junio, 

Sorry for asking a maybe obvious question.
Will that be merged into maint as well?
It is a regression in 2.11 so I would have expected to see that in maint.

Thanks,
Thomas

^ permalink raw reply

* Re: [RFC 1/2] grep: only add delimiter if there isn't one already
From: Jakub Narębski @ 2017-01-24 19:07 UTC (permalink / raw)
  To: Stefan Hajnoczi, Junio C Hamano; +Cc: git
In-Reply-To: <20170123131551.GL29186@stefanha-x1.localdomain>

W dniu 23.01.2017 o 14:15, Stefan Hajnoczi pisze:
> On Fri, Jan 20, 2017 at 10:16:31AM -0800, Junio C Hamano wrote:

>> My only piece of advice to folks who feel that way is to learn Git
>> more and get comfortable.  You can do neat things like
>>
>>    $ git grep -e pattern rev -- t ':!t/helper/'
>>
>> that you cannot do with "rev:t", for example ;-)
> 
> Neat, thanks for showing the path exclusion syntax.  I wasn't aware of
> it.

That reminds me of mu TODO item: moving extended pathspec information
from gitglossary(7) manpage (sic!) to to-be-created gitpathspec(7).

-- 
Jakub Narębski


^ permalink raw reply

* Re: [PATCH 1/3] Documentation/stash: remove mention of git reset --hard
From: Jakub Narębski @ 2017-01-24 19:51 UTC (permalink / raw)
  To: Thomas Gummerer, git
  Cc: Stephan Beyer, Junio C Hamano, Marc Strapetz, Jeff King,
	Johannes Schindelin
In-Reply-To: <20170121200804.19009-2-t.gummerer@gmail.com>

W dniu 21.01.2017 o 21:08, Thomas Gummerer pisze:
> Don't mention git reset --hard in the documentation for git stash save.
> It's an implementation detail that doesn't matter to the end user and
> thus shouldn't be exposed to them.
> 
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  Documentation/git-stash.txt | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 2e9cef06e6..0ad5335a3e 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -47,8 +47,9 @@ OPTIONS
>  
>  save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] [<message>]::
>  
> -	Save your local modifications to a new 'stash', and run `git reset
> -	--hard` to revert them.  The <message> part is optional and gives
> +	Save your local modifications to a new 'stash', and revert the
> +	the changes in the working tree to match the index.

I think the following might be better:

        ..., and set the working tree to match the index.

Or not, as it ignores problem of untracked files.

Anyway, removing internal implementation detail looks like a good idea.
OTOH the reader should be familiar with what `git reset --hard` does,
and if not, he knows where to find the information.

> +	The <message> part is optional and gives
>  	the description along with the stashed state.  For quickly making
>  	a snapshot, you can omit _both_ "save" and <message>, but giving
>  	only <message> does not trigger this action to prevent a misspelled
> 

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH 1/3] Documentation/stash: remove mention of git reset --hard
From: Jeff King @ 2017-01-24 20:14 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Stephan Beyer, Junio C Hamano, Marc Strapetz,
	Johannes Schindelin
In-Reply-To: <20170121200804.19009-2-t.gummerer@gmail.com>

On Sat, Jan 21, 2017 at 08:08:02PM +0000, Thomas Gummerer wrote:

> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 2e9cef06e6..0ad5335a3e 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -47,8 +47,9 @@ OPTIONS
>  
>  save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] [<message>]::
>  
> -	Save your local modifications to a new 'stash', and run `git reset
> -	--hard` to revert them.  The <message> part is optional and gives
> +	Save your local modifications to a new 'stash', and revert the
> +	the changes in the working tree to match the index.
> +	The <message> part is optional and gives

Hrm. "git reset --hard" doesn't just make the working tree match the
index. It also resets the index to HEAD.  So either the original or your
new description is wrong.

I think it's the latter. We really do reset the index unless
--keep-index is specified.

I also wondered if it was worth avoiding the word "revert", as "git
revert" has a much different meaning (as opposed to "svn revert", which
does what you're talking about here). But I see that "git add -i"
already uses the word revert in this way (and there are probably
others).

So it may not be worth worrying about, but "set" or "reset" probably
serves the same purpose.

-Peff

^ permalink raw reply

* Re: [PATCH 08/12] add oidset API
From: Ramsay Jones @ 2017-01-24 20:26 UTC (permalink / raw)
  To: Jeff King, git
In-Reply-To: <20170124004647.3o26ionfq3td2irf@sigill.intra.peff.net>



On 24/01/17 00:46, Jeff King wrote:
> This is similar to many of our uses of sha1-array, but it
> overcomes one limitation of a sha1-array: when you are
> de-duplicating a large input with relatively few unique
> entries, sha1-array uses 20 bytes per non-unique entry.
> Whereas this set will use memory linear in the number of
> unique entries (albeit a few more than 20 bytes due to
> hashmap overhead).
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This may be overkill. You can get roughly the same thing by making
> actual object structs via lookup_unknown_object(). But see the next
> patch for some comments on that.
> 
>  Makefile |  1 +
>  oidset.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  oidset.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 95 insertions(+)
>  create mode 100644 oidset.c
>  create mode 100644 oidset.h
> 
> diff --git a/Makefile b/Makefile
> index 27afd0f37..e41efc2d8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -774,6 +774,7 @@ LIB_OBJS += notes-cache.o
>  LIB_OBJS += notes-merge.o
>  LIB_OBJS += notes-utils.o
>  LIB_OBJS += object.o
> +LIB_OBJS += oidset.o
>  LIB_OBJS += pack-bitmap.o
>  LIB_OBJS += pack-bitmap-write.o
>  LIB_OBJS += pack-check.o
> diff --git a/oidset.c b/oidset.c
> new file mode 100644
> index 000000000..6094cff8c
> --- /dev/null
> +++ b/oidset.c
> @@ -0,0 +1,49 @@
> +#include "cache.h"
> +#include "oidset.h"
> +
> +struct oidset_entry {
> +	struct hashmap_entry hash;
> +	struct object_id oid;
> +};
> +
> +int oidset_hashcmp(const void *va, const void *vb,

static int oidset_hashcmp( ...

ATB,
Ramsay Jones


^ permalink raw reply

* Re: [PATCH 08/12] add oidset API
From: Jeff King @ 2017-01-24 20:35 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: git
In-Reply-To: <944ea1f8-8f9c-cc17-02a5-a73cb6565b45@ramsayjones.plus.com>

On Tue, Jan 24, 2017 at 08:26:40PM +0000, Ramsay Jones wrote:

> > +++ b/oidset.c
> > @@ -0,0 +1,49 @@
> > +#include "cache.h"
> > +#include "oidset.h"
> > +
> > +struct oidset_entry {
> > +	struct hashmap_entry hash;
> > +	struct object_id oid;
> > +};
> > +
> > +int oidset_hashcmp(const void *va, const void *vb,
> 
> static int oidset_hashcmp( ...

Yep, thanks.

-Peff

^ permalink raw reply

* Re: [DEMO][PATCH v2 6/5] compat: add a qsort_s() implementation based on GNU's qsort_r(1)
From: Jeff King @ 2017-01-24 20:39 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git List, Johannes Schindelin
In-Reply-To: <4e416167-2a33-0943-5738-79b4da5f2c11@web.de>

On Tue, Jan 24, 2017 at 07:00:03PM +0100, René Scharfe wrote:

> > I do worry about having to support more implementations in the
> > future that have different function signature for the comparison
> > callbacks, which will make things ugly, but this addition alone
> > doesn't look too bad to me.
> 
> It is unfair of me to show a 5% speedup and then recommend to not
> include it. ;-)  That difference won't be measurable in real use cases
> and the patch is not necessary.  This patch is simple, but the overall
> complexity (incl. #ifdefs etc.) will be lower without it.

I care less about squeezing out the last few percent performance and
more that somebody libc qsort_r() might offer some other improvement.
For instance, it could sort in-place to lower memory use for some cases,
or do some clever thing that gives more than a few percent in the real
world (something like TimSort).

I don't know to what degree we should care about that.

> But here's another one, with even higher performance and with an even
> bigger recommendation to not include it! :)  It veers off into another
> direction: Parallel execution.  It requires thread-safe comparison
> functions, which might surprise callers.  The value 1000 for the minimum
> number of items before threading kicks in is just a guess, not based on
> measurements.  So it's quite raw -- and I'm not sure why it's still a
> bit slower than sort(1).

Fun, but probably insane for our not-very-threadsafe code base. :)

-Peff

^ permalink raw reply

* Re: [PATCH v2 0/5] string-list: make string_list_sort() reentrant
From: Jeff King @ 2017-01-24 20:41 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Johannes Schindelin
In-Reply-To: <b333ecd4-a147-904d-b1ce-b49179c4ad26@web.de>

On Tue, Jan 24, 2017 at 07:00:07PM +0100, René Scharfe wrote:

> Am 24.01.2017 um 00:54 schrieb Jeff King:
> > The speed looks like a reasonable outcome. I'm torn on the qsort_r()
> > demo patch. I don't think it looks too bad. OTOH, I don't think I would
> > want to deal with the opposite-argument-order versions.
> 
> The code itself may look OK, but it's not really necessary and the special
> implementation for Linux makes increases maintenance costs.  Can we save it
> for later and first give the common implemention a chance to prove itself?

Sure, I'm OK with leaving it out for now.

> > Is there any interest in people adding the ISO qsort_s() to their libc
> > implementations? It seems like it's been a fair number of years by now.
> 
> https://sourceware.org/ml/libc-alpha/2014-12/msg00513.html is the last post
> mentioning qsort_s on the glibc mailing list, but it didn't even make it
> into https://sourceware.org/glibc/wiki/Development_Todo/Master.
> Not sure what's planned in BSD land, didn't find anything (but didn't look
> too hard).

So it sounds like "no, not really". I think that's OK. I was mostly
curious if we could expect our custom implementation to age out over
time.

-Peff

^ permalink raw reply

* Re: [RFC 1/2] grep: only add delimiter if there isn't one already
From: Philip Oakley @ 2017-01-24 20:48 UTC (permalink / raw)
  To: Stefan Hajnoczi, Junio C Hamano, Jakub Narębski; +Cc: git
In-Reply-To: <f90eba2a-ebfa-67f0-68c4-abacb05759ba@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

From: "Jakub Narębski" <jnareb@gmail.com>
>W dniu 23.01.2017 o 14:15, Stefan Hajnoczi pisze:
>> On Fri, Jan 20, 2017 at 10:16:31AM -0800, Junio C Hamano wrote:
>
>>> My only piece of advice to folks who feel that way is to learn Git
>>> more and get comfortable.  You can do neat things like
>>>
>>>    $ git grep -e pattern rev -- t ':!t/helper/'
>>>
>>> that you cannot do with "rev:t", for example ;-)
>>
>> Neat, thanks for showing the path exclusion syntax.  I wasn't aware of
>> it.
>
> That reminds me of mu TODO item: moving extended pathspec information
> from gitglossary(7) manpage (sic!) to to-be-created gitpathspec(7).
>

Good to see someone else also had it on a ToDo list..

Attached is my collation of all the different path spec info I found from 
trawling the man & guide pages to satisfy my ignorance...
--
Philip 

[-- Attachment #2: gitpathspec.txt --]
[-- Type: text/plain, Size: 12282 bytes --]

gitpathspec(7)
============

NAME
----
gitpathspec - How to specify a path or file to git

SYNOPSIS
--------
$HOME/.config/git/ignore, $GIT_DIR/info/exclude, .gitignore

DESCRIPTION
-----------

Pathspecs are used in a range of git functions.
.gitignore
.gitexclude
gitsparse
git-add -- pathspec
git-checkout -- pathspec (after the double-dash)
git grep (active/non-active wild card matching)
git log (L#:<> pathspec limiters ?? what does it mean)

git rerere (uncontentious)
git status (uncontentious)
gitk (uncontentious, but see 'log' above)
'git' itself --
--literal-pathspecs
Treat pathspecs literally (i.e. no globbing, no pathspec magic). This is equivalent to setting the GIT_LITERAL_PATHSPECS environment variable to 1.

--glob-pathspecs
Add "glob" magic to all pathspec. This is equivalent to setting the GIT_GLOB_PATHSPECS environment variable to 1. Disabling globbing on individual pathspecs can be done using pathspec magic ":(literal)"

--noglob-pathspecs
Add "literal" magic to all pathspec. This is equivalent to setting the GIT_NOGLOB_PATHSPECS environment variable to 1. Enabling globbing on individual pathspecs can be done using pathspec magic ":(glob)"

--icase-pathspecs
Add "icase" magic to all pathspec. This is equivalent to setting the GIT_ICASE_PATHSPECS environment variable to 1.

see glossary-content
pathspec
Pattern used to limit paths in Git commands.

Pathspecs are used on the command line of "git ls-files", "git ls-tree", "git add", "git grep", "git diff", "git checkout", and many other commands to limit the scope of operations to some subset of the tree or worktree. See the documentation of each command for whether paths are relative to the current directory or toplevel. The pathspec syntax is as follows:

any path matches itself

the pathspec up to the last slash represents a directory prefix. The scope of that pathspec is limited to that subtree.

the rest of the pathspec is a pattern for the remainder of the pathname. Paths relative to the directory prefix will be matched against that pattern using fnmatch(3); in particular, * and ? can match directory separators.

For example, Documentation/*.jpg will match all .jpg files in the Documentation subtree, including Documentation/chapter_1/figure_1.jpg.

A pathspec that begins with a colon : has special meaning. In the short form, the leading colon : is followed by zero or more "magic signature" letters (which optionally is terminated by another colon :), and the remainder is the pattern to match against the path. The "magic signature" consists of ASCII symbols that are neither alphanumeric, glob, regex special charaters nor colon. The optional colon that terminates the "magic signature" can be omitted if the pattern begins with a character that does not belong to "magic signature" symbol set and is not a colon.

In the long form, the leading colon : is followed by a open parenthesis (, a comma-separated list of zero or more "magic words", and a close parentheses ), and the remainder is the pattern to match against the path.

A pathspec with only a colon means "there is no pathspec". This form should not be combined with other pathspec.

top
The magic word top (magic signature: /) makes the pattern match from the root of the working tree, even when you are running the command from inside a subdirectory.

literal
Wildcards in the pattern such as * or ? are treated as literal characters.

icase
Case insensitive match.

glob
Git treats the pattern as a shell glob suitable for consumption by fnmatch(3) with the FNM_PATHNAME flag: wildcards in the pattern will not match a / in the pathname. For example, "Documentation/*.html" matches "Documentation/git.html" but not "Documentation/ppc/ppc.html" or "tools/perf/Documentation/perf.html".

Two consecutive asterisks ("**") in patterns matched against full pathname may have special meaning:

A leading "**" followed by a slash means match in all directories. For example, "**/foo" matches file or directory "foo" anywhere, the same as pattern "foo". "**/foo/bar" matches file or directory "bar" anywhere that is directly under directory "foo".

A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.

A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.

Other consecutive asterisks are considered invalid.

Glob magic is incompatible with literal magic.

exclude
After a path matches any non-exclude pathspec, it will be run through all exclude pathspec (magic signature: !). If it matches, the path is ignored.

Which characters to escape(\) *:\/ ??

rooting (/*) of a path (a) $GIT_DIR (b) system root.
directory (/) terminator (D/F conflict)

Compare with <path>; <paths>; and <file> 

[glob(7) patterns] (see grep)

` file specifies intentionally untracked files that
Git should ignore.
Files already tracked by Git are not affected; see the NOTES
below for details.

a `pathspec` is specified by a pattern.
When deciding whether a path matches a `pathspec` pattern, Git normally checks
with the following
order of precedence, from highest to lowest (within one level of
precedence, the last matching pattern decides the outcome):

 * Patterns read from the command line for those commands that support
   them.

 * Patterns read from a `.pathspec` file in the same directory
   as the path, or in any parent directory, with patterns in the
   higher level files (up to the toplevel of the work tree) being overridden
   by those in lower level files down to the directory containing the file.
   These patterns match relative to the location of the
   `.pathspec` file.  A project normally includes such
   `.pathspec` files in its repository, containing patterns for
   files generated as part of the project build.

 * Patterns read from `$GIT_DIR/info/exclude`.

 * Patterns read from the file specified by the configuration
   variable 'core.excludesfile'.

Which file to place a pattern in depends on how the pattern is meant to
be used.

 * Patterns which should be version-controlled and distributed to
   other repositories via clone (i.e., files that all developers will want
   to ignore) should go into a `.pathspec` file.

 * Patterns which are
   specific to a particular repository but which do not need to be shared
   with other related repositories (e.g., auxiliary files that live inside
   the repository but are specific to one user's workflow) should go into
   the `$GIT_DIR/info/exclude` file.

 * Patterns which a user wants Git to
   ignore in all situations (e.g., backup or temporary files generated by
   the user's editor of choice) generally go into a file specified by
   `core.excludesfile` in the user's `~/.gitconfig`. Its default value is
   $XDG_CONFIG_HOME/git/ignore. If $XDG_CONFIG_HOME is either not set or
   empty, $HOME/.config/git/ignore is used instead.

The underlying Git plumbing tools, such as
'git ls-files' and 'git read-tree', read
`pathspec` patterns specified by command-line options, or from
files specified by command-line options.  Higher-level Git
tools, such as 'git status' and 'git add',
use patterns from the sources specified above.

PATTERN FORMAT
--------------

 - A blank line matches no files, so it can serve as a separator
   for readability.

 - A line starting with # serves as a comment.
   Put a backslash ("`\`") in front of the first hash for patterns
   that begin with a hash.

 - An optional prefix "`!`" which negates the pattern; any
   matching file excluded by a previous pattern will become
   included again. It is not possible to re-include a file if a parent
   directory of that file is excluded. Git doesn't list excluded
   directories for performance reasons, so any patterns on contained
   files have no effect, no matter where they are defined.
   Put a backslash ("`\`") in front of the first "`!`" for patterns
   that begin with a literal "`!`", for example, "`\!important!.txt`".

 - If the pattern ends with a slash, it is removed for the
   purpose of the following description, but it would only find
   a match with a directory.  In other words, `foo/` will match a
   directory `foo` and paths underneath it, but will not match a
   regular file or a symbolic link `foo` (this is consistent
   with the way how pathspec works in general in Git).

 - If the pattern does not contain a slash '/', Git treats it as
   a shell glob pattern and checks for a match against the
   pathname relative to the location of the `.pathspec` file
   (relative to the toplevel of the work tree if not from a
   `.pathspec` file).

 - Otherwise, Git treats the pattern as a shell glob suitable
   for consumption by fnmatch(3) with the FNM_PATHNAME flag:
   wildcards in the pattern will not match a / in the pathname.
   For example, "Documentation/{asterisk}.html" matches
   "Documentation/git.html" but not "Documentation/ppc/ppc.html"
   or "tools/perf/Documentation/perf.html".

 - A leading slash matches the beginning of the pathname.
   For example, "/{asterisk}.c" matches "cat-file.c" but not
   "mozilla-sha1/sha1.c".

Two consecutive asterisks ("`**`") in patterns matched against
full pathname may have special meaning:

 - A leading "`**`" followed by a slash means match in all
   directories. For example, "`**/foo`" matches file or directory
   "`foo`" anywhere, the same as pattern "`foo`". "`**/foo/bar`"
   matches file or directory "`bar`" anywhere that is directly
   under directory "`foo`".

 - A trailing "`/**`" matches everything inside. For example,
   "`abc/**`" matches all files inside directory "`abc`", relative
   to the location of the `.pathspec` file, with infinite depth.

 - A slash followed by two consecutive asterisks then a slash
   matches zero or more directories. For example, "`a/**/b`"
   matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.

 - Other consecutive asterisks are considered invalid.

NOTES
-----

The purpose of pathspec files is to ensure that certain files
not tracked by Git remain untracked.

To ignore uncommitted changes in a file that is already tracked,
use 'git update-index {litdd}assume-unchanged'.

To stop tracking a file that is currently tracked, use
'git rm --cached'.

EXAMPLES
--------

--------------------------------------------------------------
    $ git status
    [...]
    # Untracked files:
    [...]
    #       Documentation/foo.html
    #       Documentation/pathspec.html
    #       file.o
    #       lib.a
    #       src/internal.o
    [...]
    $ cat .git/info/exclude
    # ignore objects and archives, anywhere in the tree.
    *.[oa]
    $ cat Documentation/.pathspec
    # ignore generated html files,
    *.html
    # except foo.html which is maintained by hand
    !foo.html
    $ git status
    [...]
    # Untracked files:
    [...]
    #       Documentation/foo.html
    [...]
--------------------------------------------------------------

Another example:

--------------------------------------------------------------
    $ cat .pathspec
    vmlinux*
    $ ls arch/foo/kernel/vm*
    arch/foo/kernel/vmlinux.lds.S
    $ echo '!/vmlinux*' >arch/foo/kernel/.pathspec
--------------------------------------------------------------

The second .pathspec prevents Git from ignoring
`arch/foo/kernel/vmlinux.lds.S`.

Example to exclude everything except a specific directory `foo/bar`
(note the `/*` - without the slash, the wildcard would also exclude
everything within `foo/bar`):

--------------------------------------------------------------
    $ cat .pathspec
    # exclude everything except directory foo/bar
    /*
    !/foo
    /foo/*
    !/foo/bar
--------------------------------------------------------------

SEE ALSO
--------
linkgit:git-rm[1],
linkgit:git-update-index[1],
linkgit:gitrepository-layout[5],
linkgit:git-check-ignore[1]

GIT
---
Part of the linkgit:git[1] suite

^ permalink raw reply

* [PATCH] submodule absorbing: fix worktree/gitdir pointers recursively for non-moves
From: Stefan Beller @ 2017-01-24 21:03 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, peff, Stefan Beller

Consider having a submodule 'sub' and a nested submodule at 'sub/nested'.
When nested is already absorbed into sub, but sub is not absorbed into
its superproject, then we need to fixup the gitfile and core.worktree
setting for 'nested' when absorbing 'sub', but we do not need to move
its git dir around.

Previously 'nested's gitfile contained "gitdir: ../.git/modules/nested";
it has to be corrected to "gitdir: ../../.git/modules/sub1/modules/nested".

An alternative I considered to do this work lazily, i.e. when resolving
"../.git/modules/nested", we would notice the ".git" being a gitfile
linking to another path.  That seemed to be robuster by design, but harder
to get the implementation right.  Maybe we have to do that anyway once we
try to have submodules and worktrees working nicely together, but for now
just produce 'correct' (i.e. direct) pointers.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

  cc Jeff and Brandon, who worked on the setup code recently,
  and the alternative design mentioned was messing around a lot in setup.c.
  
  Thanks,
  Stefan

 submodule.c                        | 41 +++++++++++++++++++-------------------
 t/t7412-submodule-absorbgitdirs.sh | 27 +++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/submodule.c b/submodule.c
index 4c4f033e8a..7deb0fca6a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1393,16 +1393,9 @@ static void relocate_single_git_dir_into_superproject(const char *prefix,
 	char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL;
 	const char *new_git_dir;
 	const struct submodule *sub;
-
-	if (submodule_uses_worktrees(path))
-		die(_("relocate_gitdir for submodule '%s' with "
-		      "more than one worktree not supported"), path);
+	int err_code;
 
 	old_git_dir = xstrfmt("%s/.git", path);
-	if (read_gitfile(old_git_dir))
-		/* If it is an actual gitfile, it doesn't need migration. */
-		return;
-
 	real_old_git_dir = real_pathdup(old_git_dir);
 
 	sub = submodule_from_path(null_sha1, path);
@@ -1414,6 +1407,24 @@ static void relocate_single_git_dir_into_superproject(const char *prefix,
 		die(_("could not create directory '%s'"), new_git_dir);
 	real_new_git_dir = real_pathdup(new_git_dir);
 
+	if (read_gitfile_gently(old_git_dir, &err_code) ||
+	    err_code == READ_GITFILE_ERR_NOT_A_REPO) {
+		/*
+		 * If it is an actual gitfile, it doesn't need migration,
+		 * however in case of a recursively nested submodule, the
+		 * gitfile content may be stale, as its superproject
+		 * (which may be a submodule of another superproject)
+		 * may have been moved. So expect a bogus pointer to be read,
+		 * which materializes as error READ_GITFILE_ERR_NOT_A_REPO.
+		 */
+		connect_work_tree_and_git_dir(path, real_new_git_dir);
+		return;
+	}
+
+	if (submodule_uses_worktrees(path))
+		die(_("relocate_gitdir for submodule '%s' with "
+		      "more than one worktree not supported"), path);
+
 	if (!prefix)
 		prefix = get_super_prefix();
 
@@ -1437,22 +1448,14 @@ void absorb_git_dir_into_superproject(const char *prefix,
 				      const char *path,
 				      unsigned flags)
 {
-	const char *sub_git_dir, *v;
-	char *real_sub_git_dir = NULL, *real_common_git_dir = NULL;
 	struct strbuf gitdir = STRBUF_INIT;
-
 	strbuf_addf(&gitdir, "%s/.git", path);
-	sub_git_dir = resolve_gitdir(gitdir.buf);
 
 	/* Not populated? */
-	if (!sub_git_dir)
+	if (!file_exists(gitdir.buf))
 		goto out;
 
-	/* Is it already absorbed into the superprojects git dir? */
-	real_sub_git_dir = real_pathdup(sub_git_dir);
-	real_common_git_dir = real_pathdup(get_git_common_dir());
-	if (!skip_prefix(real_sub_git_dir, real_common_git_dir, &v))
-		relocate_single_git_dir_into_superproject(prefix, path);
+	relocate_single_git_dir_into_superproject(prefix, path);
 
 	if (flags & ABSORB_GITDIR_RECURSE_SUBMODULES) {
 		struct child_process cp = CHILD_PROCESS_INIT;
@@ -1481,6 +1484,4 @@ void absorb_git_dir_into_superproject(const char *prefix,
 
 out:
 	strbuf_release(&gitdir);
-	free(real_sub_git_dir);
-	free(real_common_git_dir);
 }
diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
index 1c47780e2b..e2bbb449b6 100755
--- a/t/t7412-submodule-absorbgitdirs.sh
+++ b/t/t7412-submodule-absorbgitdirs.sh
@@ -64,6 +64,33 @@ test_expect_success 'absorb the git dir in a nested submodule' '
 	test_cmp expect.2 actual.2
 '
 
+test_expect_success 're-setup nested submodule' '
+	# un-absorb the direct submodule, to test if the nested submodule
+	# is still correct (needs a rewrite of the gitfile only)
+	rm -rf sub1/.git &&
+	mv .git/modules/sub1 sub1/.git &&
+	GIT_WORK_TREE=. git -C sub1 config --unset core.worktree &&
+	# fixup the nested submodule
+	echo "gitdir: ../.git/modules/nested" >sub1/nested/.git &&
+	GIT_WORK_TREE=../../../nested git -C sub1/.git/modules/nested config \
+		core.worktree "../../../nested" &&
+	# make sure this re-setup is correct
+	git status --ignore-submodules=none
+'
+
+test_expect_success 'absorb the git dir in a nested submodule' '
+	git status >expect.1 &&
+	git -C sub1/nested rev-parse HEAD >expect.2 &&
+	git submodule absorbgitdirs &&
+	test -f sub1/.git &&
+	test -f sub1/nested/.git &&
+	test -d .git/modules/sub1/modules/nested &&
+	git status >actual.1 &&
+	git -C sub1/nested rev-parse HEAD >actual.2 &&
+	test_cmp expect.1 actual.1 &&
+	test_cmp expect.2 actual.2
+'
+
 test_expect_success 'setup a gitlink with missing .gitmodules entry' '
 	git init sub2 &&
 	test_commit -C sub2 first &&
-- 
2.11.0.486.g67830dbe1c


^ permalink raw reply related

* Re: [PATCH] difftool.c: mark a file-local symbol with static
From: Junio C Hamano @ 2017-01-24 21:52 UTC (permalink / raw)
  To: Jeff King
  Cc: David Aguilar, Ramsay Jones, Johannes Schindelin,
	GIT Mailing-list
In-Reply-To: <20170124142346.u3d7l6772mtkgpcf@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

>> > As ugly as warning("%s", "") is, I think it may be the thing that annoys
>> > the smallest number of people.
>> > 
>> > -Peff
>> 
>> How about using warning(" ") instead?
>> 
>> For difftool.c specifically, the following is a fine solution,
>> and doesn't require that we change our warning flags just for
>> this one file.
>
> I dunno. As ugly as the "%s" thing is in the source, at least it doesn't
> change the output. Not that an extra space is the end of the world, but
> it seems like it's letting the problem escape from the source code.
>
> Do people still care about resolving this? -Wno-format-zero-length is in
> the DEVELOPER options. It wasn't clear to me if that was sufficient, or
> if we're going to get a bunch of reports from people that need to be
> directed to the right compiler options.

I view both as ugly, but probably "%s", "" is lessor of the two
evils.

Perhaps

	#define JUST_SHOW_EMPTY_LINE "%s", ""

		...
		warning(JUST_SHOW_EMPTY_LINE);
                ...

or something silly like that?


^ permalink raw reply

* Re: [PATCH] submodule absorbing: fix worktree/gitdir pointers recursively for non-moves
From: Brandon Williams @ 2017-01-24 21:58 UTC (permalink / raw)
  To: Stefan Beller; +Cc: gitster, git, peff
In-Reply-To: <20170124210346.12060-1-sbeller@google.com>

On 01/24, Stefan Beller wrote:
> +	if (read_gitfile_gently(old_git_dir, &err_code) ||
> +	    err_code == READ_GITFILE_ERR_NOT_A_REPO) {
> +		/*
> +		 * If it is an actual gitfile, it doesn't need migration,
> +		 * however in case of a recursively nested submodule, the
> +		 * gitfile content may be stale, as its superproject
> +		 * (which may be a submodule of another superproject)
> +		 * may have been moved. So expect a bogus pointer to be read,
> +		 * which materializes as error READ_GITFILE_ERR_NOT_A_REPO.
> +		 */
> +		connect_work_tree_and_git_dir(path, real_new_git_dir);

So connect_work_tree_and_git_dir() will update the .gitfile if it is
stale.

> +		return;
> +	}
> +
> +	if (submodule_uses_worktrees(path))
> +		die(_("relocate_gitdir for submodule '%s' with "
> +		      "more than one worktree not supported"), path);

No current support for worktrees (yet!).

> +
>  	if (!prefix)
>  		prefix = get_super_prefix();
>  
> @@ -1437,22 +1448,14 @@ void absorb_git_dir_into_superproject(const char *prefix,
>  				      const char *path,
>  				      unsigned flags)
>  {
> -	const char *sub_git_dir, *v;
> -	char *real_sub_git_dir = NULL, *real_common_git_dir = NULL;
>  	struct strbuf gitdir = STRBUF_INIT;
> -
>  	strbuf_addf(&gitdir, "%s/.git", path);
> -	sub_git_dir = resolve_gitdir(gitdir.buf);
>  
>  	/* Not populated? */
> -	if (!sub_git_dir)
> +	if (!file_exists(gitdir.buf))
>  		goto out;

There should be a is_submodule_populated() function now, maybe
we should start using it when performing population checks?

>  
> -	/* Is it already absorbed into the superprojects git dir? */
> -	real_sub_git_dir = real_pathdup(sub_git_dir);
> -	real_common_git_dir = real_pathdup(get_git_common_dir());
> -	if (!skip_prefix(real_sub_git_dir, real_common_git_dir, &v))
> -		relocate_single_git_dir_into_superproject(prefix, path);
> +	relocate_single_git_dir_into_superproject(prefix, path);

So the check was just pushed into the relocation function.

>  
>  	if (flags & ABSORB_GITDIR_RECURSE_SUBMODULES) {
>  		struct child_process cp = CHILD_PROCESS_INIT;
> @@ -1481,6 +1484,4 @@ void absorb_git_dir_into_superproject(const char *prefix,
>  
>  out:
>  	strbuf_release(&gitdir);
> -	free(real_sub_git_dir);
> -	free(real_common_git_dir);
>  }
> diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
> index 1c47780e2b..e2bbb449b6 100755
> --- a/t/t7412-submodule-absorbgitdirs.sh
> +++ b/t/t7412-submodule-absorbgitdirs.sh
> @@ -64,6 +64,33 @@ test_expect_success 'absorb the git dir in a nested submodule' '
>  	test_cmp expect.2 actual.2
>  '
>  
> +test_expect_success 're-setup nested submodule' '
> +	# un-absorb the direct submodule, to test if the nested submodule
> +	# is still correct (needs a rewrite of the gitfile only)
> +	rm -rf sub1/.git &&
> +	mv .git/modules/sub1 sub1/.git &&
> +	GIT_WORK_TREE=. git -C sub1 config --unset core.worktree &&
> +	# fixup the nested submodule
> +	echo "gitdir: ../.git/modules/nested" >sub1/nested/.git &&
> +	GIT_WORK_TREE=../../../nested git -C sub1/.git/modules/nested config \
> +		core.worktree "../../../nested" &&
> +	# make sure this re-setup is correct
> +	git status --ignore-submodules=none
> +'
> +
> +test_expect_success 'absorb the git dir in a nested submodule' '
> +	git status >expect.1 &&
> +	git -C sub1/nested rev-parse HEAD >expect.2 &&
> +	git submodule absorbgitdirs &&
> +	test -f sub1/.git &&
> +	test -f sub1/nested/.git &&
> +	test -d .git/modules/sub1/modules/nested &&
> +	git status >actual.1 &&
> +	git -C sub1/nested rev-parse HEAD >actual.2 &&
> +	test_cmp expect.1 actual.1 &&
> +	test_cmp expect.2 actual.2
> +'
> +
>  test_expect_success 'setup a gitlink with missing .gitmodules entry' '
>  	git init sub2 &&
>  	test_commit -C sub2 first &&
> -- 
> 2.11.0.486.g67830dbe1c


Aside from my one question the rest of this looks good to me.

-- 
Brandon Williams

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #03; Thu, 19)
From: Junio C Hamano @ 2017-01-24 21:59 UTC (permalink / raw)
  To: Thomas Braun; +Cc: git
In-Reply-To: <1929244236.1844627.1485284187492.JavaMail.open-xchange@app04.ox.hosteurope.de>

Thomas Braun <thomas.braun@virtuell-zuhause.de> writes:

>> * rh/mergetool-regression-fix (2017-01-10) 14 commits
>>   (merged to 'next' on 2017-01-10 at e8e00c798b)
>>  + mergetool: fix running in subdir when rerere enabled
>>  + mergetool: take the "-O" out of $orderfile
>>  + t7610: add test case for rerere+mergetool+subdir bug
>>  + t7610: spell 'git reset --hard' consistently
>>  + t7610: don't assume the checked-out commit
>>  + t7610: always work on a test-specific branch
>>  + t7610: delete some now-unnecessary 'git reset --hard' lines
>>  + t7610: run 'git reset --hard' after each test to clean up
>>  + t7610: don't rely on state from previous test
>>  + t7610: use test_when_finished for cleanup tasks
>>  + t7610: move setup code to the 'setup' test case
>>  + t7610: update branch names to match test number
>>  + rev-parse doc: pass "--" to rev-parse in the --prefix example
>>  + .mailmap: record canonical email for Richard Hansen
> ...
> Sorry for asking a maybe obvious question.
> Will that be merged into maint as well?

A good way to tell is to compare outputs from these two:

    $ git log --first-parent maint..$tip_of_the_topic
    $ git log --first-parent master..$tip_of_the_topic

I actually wasn't planning to.  We (or distro packagers) may want to
cherry-pick the essential bits from the topic and backport to 'maint'.


^ permalink raw reply

* Re: [PATCH] submodule absorbing: fix worktree/gitdir pointers recursively for non-moves
From: Brandon Williams @ 2017-01-24 22:19 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org, Jeff King
In-Reply-To: <CAGZ79kYKkx441bbU5Oy9Ernb1FmbcTybYbL_M_+yWG_ycfPwrA@mail.gmail.com>

On 01/24, Stefan Beller wrote:
> On Tue, Jan 24, 2017 at 1:58 PM, Brandon Williams <bmwill@google.com> wrote:
> > On 01/24, Stefan Beller wrote:
> >> +     if (read_gitfile_gently(old_git_dir, &err_code) ||
> >> +         err_code == READ_GITFILE_ERR_NOT_A_REPO) {
> >> +             /*
> >> +              * If it is an actual gitfile, it doesn't need migration,
> >> +              * however in case of a recursively nested submodule, the
> >> +              * gitfile content may be stale, as its superproject
> >> +              * (which may be a submodule of another superproject)
> >> +              * may have been moved. So expect a bogus pointer to be read,
> >> +              * which materializes as error READ_GITFILE_ERR_NOT_A_REPO.
> >> +              */
> >> +             connect_work_tree_and_git_dir(path, real_new_git_dir);
> >
> > So connect_work_tree_and_git_dir() will update the .gitfile if it is
> > stale.
> >
> >> +             return;
> >> +     }
> >> +
> >> +     if (submodule_uses_worktrees(path))
> >> +             die(_("relocate_gitdir for submodule '%s' with "
> >> +                   "more than one worktree not supported"), path);
> >
> > No current support for worktrees (yet!).
> >
> >> +
> >>       if (!prefix)
> >>               prefix = get_super_prefix();
> >>
> >> @@ -1437,22 +1448,14 @@ void absorb_git_dir_into_superproject(const char *prefix,
> >>                                     const char *path,
> >>                                     unsigned flags)
> >>  {
> >> -     const char *sub_git_dir, *v;
> >> -     char *real_sub_git_dir = NULL, *real_common_git_dir = NULL;
> >>       struct strbuf gitdir = STRBUF_INIT;
> >> -
> >>       strbuf_addf(&gitdir, "%s/.git", path);
> >> -     sub_git_dir = resolve_gitdir(gitdir.buf);
> >>
> >>       /* Not populated? */
> >> -     if (!sub_git_dir)
> >> +     if (!file_exists(gitdir.buf))
> >>               goto out;
> >
> > There should be a is_submodule_populated() function now, maybe
> > we should start using it when performing population checks?
> 
> Yes I am aware of that, but the problem is we cannot use it here.
> is_submodule_populated[1], just like the code here, uses
> resolve_gitdir, which is
> 
>     const char *resolve_gitdir(const char *suspect)
>     {
>         if (is_git_directory(suspect))
>            return suspect;
>         return read_gitfile(suspect);
>     }
> 
> And there you see the problem: read_gitfile will die on error.
> we'd have to have use read_gitfile_gently(old_git_dir, &err_code),
> and then allow READ_GITFILE_ERR_NOT_A_REPO to go through,
> just as above.

Hmm, then maybe is_submodule_populated should be rewritten to not die on
an error then?

> 
> And that is also the reason why we had to move submodule_uses_worktrees
> down, as it also uses no gentle function to look for a git directory
> (read: it would die as well). When you have bogus content in your
> .git file, there is really nothing you can do to determine if the submodule
> is part of a worktree setup, so it is fine to postpone the check until after we
> fixed up the link.
> 
> So here is the bug you spotted: If it is a worktree already, then
> read_gitfile_gently would work fine, no need to "fix" it.
> 
> I'll resend with logic as follows:
> 
>     char *retvalue = read_gitfile_gently(old_git_dir, &err_code);
>     if (retvalue)
>         // return early; a worktree is fine here, no need to check
>         // because we do nothing
> 
>     if (err_code == READ_GITFILE_ERR_NOT_A_REPO)
>         // connect; then check for worktree and return early;
> 
>     // do the actual relocation.
> 
> 
> [1] as found e.g. at
> https://public-inbox.org/git/1481915002-162130-2-git-send-email-bmwill@google.com/
> 
> >
> >>
> >> -     /* Is it already absorbed into the superprojects git dir? */
> >> -     real_sub_git_dir = real_pathdup(sub_git_dir);
> >> -     real_common_git_dir = real_pathdup(get_git_common_dir());
> >> -     if (!skip_prefix(real_sub_git_dir, real_common_git_dir, &v))
> >> -             relocate_single_git_dir_into_superproject(prefix, path);
> >> +     relocate_single_git_dir_into_superproject(prefix, path);
> >
> > So the check was just pushed into the relocation function.
> 
> The check was pushed down, so we can use the
> connect_work_tree_and_git_dir instead.
> 
> >
> >>
> >>       if (flags & ABSORB_GITDIR_RECURSE_SUBMODULES) {
> >>               struct child_process cp = CHILD_PROCESS_INIT;
> >> @@ -1481,6 +1484,4 @@ void absorb_git_dir_into_superproject(const char *prefix,
> >>
> >>  out:
> >>       strbuf_release(&gitdir);
> >> -     free(real_sub_git_dir);
> >> -     free(real_common_git_dir);
> >>  }
> >> diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
> >> index 1c47780e2b..e2bbb449b6 100755
> >> --- a/t/t7412-submodule-absorbgitdirs.sh
> >> +++ b/t/t7412-submodule-absorbgitdirs.sh
> >> @@ -64,6 +64,33 @@ test_expect_success 'absorb the git dir in a nested submodule' '
> >>       test_cmp expect.2 actual.2
> >>  '
> >>
> >> +test_expect_success 're-setup nested submodule' '
> >> +     # un-absorb the direct submodule, to test if the nested submodule
> >> +     # is still correct (needs a rewrite of the gitfile only)
> >> +     rm -rf sub1/.git &&
> >> +     mv .git/modules/sub1 sub1/.git &&
> >> +     GIT_WORK_TREE=. git -C sub1 config --unset core.worktree &&
> >> +     # fixup the nested submodule
> >> +     echo "gitdir: ../.git/modules/nested" >sub1/nested/.git &&
> >> +     GIT_WORK_TREE=../../../nested git -C sub1/.git/modules/nested config \
> >> +             core.worktree "../../../nested" &&
> >> +     # make sure this re-setup is correct
> >> +     git status --ignore-submodules=none
> >> +'
> >> +
> >> +test_expect_success 'absorb the git dir in a nested submodule' '
> >> +     git status >expect.1 &&
> >> +     git -C sub1/nested rev-parse HEAD >expect.2 &&
> >> +     git submodule absorbgitdirs &&
> >> +     test -f sub1/.git &&
> >> +     test -f sub1/nested/.git &&
> >> +     test -d .git/modules/sub1/modules/nested &&
> >> +     git status >actual.1 &&
> >> +     git -C sub1/nested rev-parse HEAD >actual.2 &&
> >> +     test_cmp expect.1 actual.1 &&
> >> +     test_cmp expect.2 actual.2
> >> +'
> >> +
> >>  test_expect_success 'setup a gitlink with missing .gitmodules entry' '
> >>       git init sub2 &&
> >>       test_commit -C sub2 first &&
> >> --
> >> 2.11.0.486.g67830dbe1c
> >
> >
> > Aside from my one question the rest of this looks good to me.
> >
> > --
> > Brandon Williams

-- 
Brandon Williams

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox