Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v2 03/10] mailmap: remove email copy and length limitation
From: Junio C Hamano @ 2013-01-08  0:10 UTC (permalink / raw)
  To: git; +Cc: Antoine Pelisse
In-Reply-To: <1357603821-8647-1-git-send-email-gitster@pobox.com>

In map_user(), we have email pointer that points at the beginning of
an e-mail address, but the buffer is not terminated with a NUL after
the e-mail address.  It typically has ">" after the address, and it
could have even more if it comes from author/committer line in a
commit object.  Or it may not have ">" after it.

We used to copy the e-mail address proper into a temporary buffer
before asking the string-list API to find the e-mail address in the
mailmap, because string_list_lookup() function only takes a NUL
terminated full string.

Introduce a helper function lookup_prefix that takes the email
pointer and the length, and finds a matching entry in the string
list used for the mailmap, by doing the following:

 - First ask string_list_find_insert_index() where in its sorted
   list the e-mail address we have (including the possible trailing
   junk ">...") would be inserted.

 - It could find an exact match (e.g. we had a clean e-mail address
   without any trailing junk).  We can return the item in that case.

 - Or it could return the index of an item that sorts after the
   e-mail address we have.

 - If we did not find an exact match against a clean e-mail address,
   then the record we are looking for in the mailmap has to exist
   before the index returned by the function (i.e. "email>junk"
   always sorts later than "email").  Iterate, starting from that
   index, down the map->items[] array until we find the exact record
   we are looking for, or we see a record with a key that definitely
   sorts earlier than the e-mail we are looking for (i.e. when we
   are looking for "email" in "email>junk", a record in the mailmap
   that begins with "emaik" strictly sorts before "email", if such a
   key existed in the mailmap).

This, together with the earlier enhancement to support
case-insensitive sorting, allow us to remove an extra copy of email
buffer to downcase it.

A part of this is based on Antoine Pelisse's previous work.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 mailmap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 16 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index ea4b471..1a0b769 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -174,6 +174,7 @@ static int read_single_mailmap(struct string_list *map, const char *filename, ch
 int read_mailmap(struct string_list *map, char **repo_abbrev)
 {
 	map->strdup_strings = 1;
+	map->cmp = strcasecmp;
 	/* each failure returns 1, so >1 means both calls failed */
 	return read_single_mailmap(map, ".mailmap", repo_abbrev) +
 	       read_single_mailmap(map, git_mailmap_file, repo_abbrev) > 1;
@@ -187,14 +188,50 @@ void clear_mailmap(struct string_list *map)
 	debug_mm("mailmap: cleared\n");
 }
 
+static struct string_list_item *lookup_prefix(struct string_list *map,
+					      const char *string, size_t len)
+{
+	int i = string_list_find_insert_index(map, string, 1);
+	if (i < 0) {
+		/* exact match */
+		i = -1 - i;
+		/* does it match exactly? */
+		if (!map->items[i].string[len])
+			return &map->items[i];
+	}
+
+	/*
+	 * i is at the exact match to an overlong key, or
+	 * location the possibly overlong key would be inserted,
+	 * which must be after the real location of the key.
+	 */
+	while (0 <= --i && i < map->nr) {
+		int cmp = strncasecmp(map->items[i].string, string, len);
+		if (cmp < 0)
+			/*
+			 * "i" points at a key definitely below the prefix;
+			 * the map does not have string[0:len] in it.
+			 */
+			break;
+		else if (!cmp && !map->items[i].string[len])
+			/* found it */
+			return &map->items[i];
+		/*
+		 * otherwise, the string at "i" may be string[0:len]
+		 * followed by a string that sorts later than string[len:];
+		 * keep trying.
+		 */
+	}
+	return NULL;
+}
+
 int map_user(struct string_list *map,
 	     char *email, int maxlen_email, char *name, int maxlen_name)
 {
 	char *end_of_email;
 	struct string_list_item *item;
 	struct mailmap_entry *me;
-	char buf[1024], *mailbuf;
-	int i;
+	size_t maillen;
 
 	/* figure out space requirement for email */
 	end_of_email = strchr(email, '>');
@@ -204,18 +241,12 @@ int map_user(struct string_list *map,
 		if (!end_of_email)
 			return 0;
 	}
-	if (end_of_email - email + 1 < sizeof(buf))
-		mailbuf = buf;
-	else
-		mailbuf = xmalloc(end_of_email - email + 1);
-
-	/* downcase the email address */
-	for (i = 0; i < end_of_email - email; i++)
-		mailbuf[i] = tolower(email[i]);
-	mailbuf[i] = 0;
-
-	debug_mm("map_user: map '%s' <%s>\n", name, mailbuf);
-	item = string_list_lookup(map, mailbuf);
+
+	maillen = end_of_email - email;
+
+	debug_mm("map_user: map '%s' <%.*s>\n", name, maillen, email);
+
+	item = lookup_prefix(map, email, maillen);
 	if (item != NULL) {
 		me = (struct mailmap_entry *)item->util;
 		if (me->namemap.nr) {
@@ -226,8 +257,6 @@ int map_user(struct string_list *map,
 				item = subitem;
 		}
 	}
-	if (mailbuf != buf)
-		free(mailbuf);
 	if (item != NULL) {
 		struct mailmap_info *mi = (struct mailmap_info *)item->util;
 		if (mi->name == NULL && (mi->email == NULL || maxlen_email == 0)) {
-- 
1.8.1.304.gf036638

^ permalink raw reply related

* [PATCH v2 02/10] Use split_ident_line to parse author and committer
From: Junio C Hamano @ 2013-01-08  0:10 UTC (permalink / raw)
  To: git; +Cc: Antoine Pelisse
In-Reply-To: <1357603821-8647-1-git-send-email-gitster@pobox.com>

From: Antoine Pelisse <apelisse@gmail.com>

Currently blame.c::get_acline(), pretty.c::pp_user_info() and
shortlog.c::insert_one_record() are parsing author name, email, time
and tz themselves.

Use ident.c::split_ident_line() for better code reuse.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/blame.c    | 59 +++++++++++++++++++-----------------------------------
 builtin/shortlog.c | 36 +++++++++------------------------
 pretty.c           | 35 +++++++++++++++++++-------------
 3 files changed, 52 insertions(+), 78 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index cfae569..dd4aff9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1343,8 +1343,9 @@ static void get_ac_line(const char *inbuf, const char *what,
 			int mail_len, char *mail,
 			unsigned long *time, const char **tz)
 {
-	int len, tzlen, maillen;
-	char *tmp, *endp, *timepos, *mailpos;
+	struct ident_split ident;
+	int len, tzlen, maillen, namelen;
+	char *tmp, *endp, *mailpos;
 
 	tmp = strstr(inbuf, what);
 	if (!tmp)
@@ -1355,7 +1356,10 @@ static void get_ac_line(const char *inbuf, const char *what,
 		len = strlen(tmp);
 	else
 		len = endp - tmp;
-	if (person_len <= len) {
+	if (person_len <= len)
+		goto error_out;
+
+	if (split_ident_line(&ident, tmp, len)) {
 	error_out:
 		/* Ugh */
 		*tz = "(unknown)";
@@ -1364,47 +1368,26 @@ static void get_ac_line(const char *inbuf, const char *what,
 		*time = 0;
 		return;
 	}
-	memcpy(person, tmp, len);
 
-	tmp = person;
-	tmp += len;
-	*tmp = 0;
-	while (person < tmp && *tmp != ' ')
-		tmp--;
-	if (tmp <= person)
-		goto error_out;
-	*tz = tmp+1;
-	tzlen = (person+len)-(tmp+1);
+	namelen = ident.name_end - ident.name_begin;
+	memcpy(person, ident.name_begin, namelen);
+	person[namelen] = 0;
 
-	*tmp = 0;
-	while (person < tmp && *tmp != ' ')
-		tmp--;
-	if (tmp <= person)
-		goto error_out;
-	*time = strtoul(tmp, NULL, 10);
-	timepos = tmp;
+	maillen = ident.mail_end - ident.mail_begin + 2;
+	memcpy(mail, ident.mail_begin - 1, maillen);
+	mail[maillen] = 0;
 
-	*tmp = 0;
-	while (person < tmp && !(*tmp == ' ' && tmp[1] == '<'))
-		tmp--;
-	if (tmp <= person)
-		return;
-	mailpos = tmp + 1;
-	*tmp = 0;
-	maillen = timepos - tmp;
-	memcpy(mail, mailpos, maillen);
+	*time = strtoul(ident.date_begin, NULL, 10);
 
-	if (!mailmap.nr)
-		return;
+	tzlen = ident.tz_end - ident.tz_begin;
 
-	/*
-	 * mailmap expansion may make the name longer.
-	 * make room by pushing stuff down.
-	 */
-	tmp = person + person_len - (tzlen + 1);
-	memmove(tmp, *tz, tzlen);
+	/* Place tz at the end of person */
+	*tz = tmp = person + person_len - (tzlen + 1);
+	memcpy(tmp, ident.tz_begin, tzlen);
 	tmp[tzlen] = 0;
-	*tz = tmp;
+
+	if (!mailmap.nr)
+		return;
 
 	/*
 	 * Now, convert both name and e-mail using mailmap
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index b316cf3..03c6cd7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -40,40 +40,24 @@ static void insert_one_record(struct shortlog *log,
 	char emailbuf[1024];
 	size_t len;
 	const char *eol;
-	const char *boemail, *eoemail;
 	struct strbuf subject = STRBUF_INIT;
+	struct ident_split ident;
 
-	boemail = strchr(author, '<');
-	if (!boemail)
-		return;
-	eoemail = strchr(boemail, '>');
-	if (!eoemail)
+	if (split_ident_line(&ident, author, strlen(author)))
 		return;
 
 	/* copy author name to namebuf, to support matching on both name and email */
-	memcpy(namebuf, author, boemail - author);
-	len = boemail - author;
-	while (len > 0 && isspace(namebuf[len-1]))
-		len--;
+	len = ident.name_end - ident.name_begin;
+	memcpy(namebuf, ident.name_begin, len);
 	namebuf[len] = 0;
 
 	/* copy email name to emailbuf, to allow email replacement as well */
-	memcpy(emailbuf, boemail+1, eoemail - boemail);
-	emailbuf[eoemail - boemail - 1] = 0;
-
-	if (!map_user(&log->mailmap, emailbuf, sizeof(emailbuf), namebuf, sizeof(namebuf))) {
-		while (author < boemail && isspace(*author))
-			author++;
-		for (len = 0;
-		     len < sizeof(namebuf) - 1 && author + len < boemail;
-		     len++)
-			namebuf[len] = author[len];
-		while (0 < len && isspace(namebuf[len-1]))
-			len--;
-		namebuf[len] = '\0';
-	}
-	else
-		len = strlen(namebuf);
+	len = ident.mail_end - ident.mail_begin;
+	memcpy(emailbuf, ident.mail_begin, len);
+	emailbuf[len] = 0;
+
+	map_user(&log->mailmap, emailbuf, sizeof(emailbuf), namebuf, sizeof(namebuf));
+	len = strlen(namebuf);
 
 	if (log->email) {
 		size_t room = sizeof(namebuf) - len - 1;
diff --git a/pretty.c b/pretty.c
index 5bdc2e7..350d1df 100644
--- a/pretty.c
+++ b/pretty.c
@@ -387,29 +387,36 @@ void pp_user_info(const struct pretty_print_context *pp,
 		  const char *what, struct strbuf *sb,
 		  const char *line, const char *encoding)
 {
+	struct ident_split ident;
+	int linelen, namelen;
+	char *line_end, *date;
 	int max_length = 78; /* per rfc2822 */
-	char *date;
-	int namelen;
 	unsigned long time;
 	int tz;
 
 	if (pp->fmt == CMIT_FMT_ONELINE)
 		return;
-	date = strchr(line, '>');
-	if (!date)
+
+	line_end = strchr(line, '\n');
+	if (!line_end) {
+		line_end = strchr(line, '\0');
+		if (!line_end)
+			return;
+	}
+
+	linelen = ++line_end - line;
+	if (split_ident_line(&ident, line, linelen))
 		return;
-	namelen = ++date - line;
-	time = strtoul(date, &date, 10);
+
+	namelen = ident.mail_end - ident.name_begin + 1;
+	time = strtoul(ident.date_begin, &date, 10);
 	tz = strtol(date, NULL, 10);
 
 	if (pp->fmt == CMIT_FMT_EMAIL) {
-		char *name_tail = strchr(line, '<');
 		int display_name_length;
-		if (!name_tail)
-			return;
-		while (line < name_tail && isspace(name_tail[-1]))
-			name_tail--;
-		display_name_length = name_tail - line;
+
+		display_name_length = ident.name_end - ident.name_begin;
+
 		strbuf_addstr(sb, "From: ");
 		if (needs_rfc2047_encoding(line, display_name_length, RFC2047_ADDRESS)) {
 			add_rfc2047(sb, line, display_name_length,
@@ -427,10 +434,10 @@ void pp_user_info(const struct pretty_print_context *pp,
 		}
 		if (namelen - display_name_length + last_line_length(sb) > max_length) {
 			strbuf_addch(sb, '\n');
-			if (!isspace(name_tail[0]))
+			if (!isspace(ident.name_end[0]))
 				strbuf_addch(sb, ' ');
 		}
-		strbuf_add(sb, name_tail, namelen - display_name_length);
+		strbuf_add(sb, ident.name_end, namelen - display_name_length);
 		strbuf_addch(sb, '\n');
 	} else {
 		strbuf_addf(sb, "%s: %.*s%.*s\n", what,
-- 
1.8.1.304.gf036638

^ permalink raw reply related

* [PATCH v2 00/10] reroll of ap/log-mailmap
From: Junio C Hamano @ 2013-01-08  0:10 UTC (permalink / raw)
  To: git; +Cc: Antoine Pelisse

This is a reroll of the previous series Antoine posted on Saturday.

A new patch "string-list: allow case-insensitive string list"
teaches the string-list API that some string lists can be sorted
case insensitively (actually, you can feed any custom two string
comparison functions).

The string_list_lookup_extended() function introduced by the
previous series has been discarded.  Instead, the third patch
"mailmap: remove email copy and length limitation" introduces a
helper function that takes a <char *, size_t> key that is not NUL
terminated to look for a matching item in a string list, and uses
that to update map_user() function, together with the fourth
patch "mailmap: simplify map_user() interface".

All other patches are unmodified from Antoine's series (modulo
wording tweaks here and there).

Antoine Pelisse (9):
  Use split_ident_line to parse author and committer
  mailmap: remove email copy and length limitation
  mailmap: simplify map_user() interface
  mailmap: add mailmap structure to rev_info and pp
  pretty: use mailmap to display username and email
  log: add --use-mailmap option
  test: add test for --use-mailmap option
  log: grep author/committer using mailmap
  log: add log.mailmap configuration option

Junio C Hamano (1):
  string-list: allow case-insensitive string list

 Documentation/config.txt  |   4 +
 Documentation/git-log.txt |   5 ++
 builtin/blame.c           | 183 ++++++++++++++++++++++------------------------
 builtin/log.c             |  16 +++-
 builtin/shortlog.c        |  54 ++++----------
 commit.h                  |   1 +
 log-tree.c                |   1 +
 mailmap.c                 |  94 +++++++++++++++---------
 mailmap.h                 |   4 +-
 pretty.c                  | 114 ++++++++++++++++-------------
 revision.c                |  54 ++++++++++++++
 revision.h                |   1 +
 string-list.c             |  17 ++++-
 string-list.h             |   4 +
 t/t4203-mailmap.sh        |  56 ++++++++++++++
 15 files changed, 379 insertions(+), 229 deletions(-)

-- 
1.8.1.304.gf036638

^ permalink raw reply

* [PATCH v2 01/10] string-list: allow case-insensitive string list
From: Junio C Hamano @ 2013-01-08  0:10 UTC (permalink / raw)
  To: git; +Cc: Antoine Pelisse
In-Reply-To: <1357603821-8647-1-git-send-email-gitster@pobox.com>

Some string list needs to be searched case insensitively, and for
that to work correctly, the string needs to be sorted case
insensitively from the beginning.

Allow a custom comparison function to be defined on a string list
instance and use it throughout in place of strcmp().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 string-list.c | 17 +++++++++++++----
 string-list.h |  4 ++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/string-list.c b/string-list.c
index 397e6cf..6f3d8cf 100644
--- a/string-list.c
+++ b/string-list.c
@@ -7,10 +7,11 @@ static int get_entry_index(const struct string_list *list, const char *string,
 		int *exact_match)
 {
 	int left = -1, right = list->nr;
+	compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
 
 	while (left + 1 < right) {
 		int middle = (left + right) / 2;
-		int compare = strcmp(string, list->items[middle].string);
+		int compare = cmp(string, list->items[middle].string);
 		if (compare < 0)
 			right = middle;
 		else if (compare > 0)
@@ -96,8 +97,9 @@ void string_list_remove_duplicates(struct string_list *list, int free_util)
 {
 	if (list->nr > 1) {
 		int src, dst;
+		compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
 		for (src = dst = 1; src < list->nr; src++) {
-			if (!strcmp(list->items[dst - 1].string, list->items[src].string)) {
+			if (!cmp(list->items[dst - 1].string, list->items[src].string)) {
 				if (list->strdup_strings)
 					free(list->items[src].string);
 				if (free_util)
@@ -230,15 +232,20 @@ struct string_list_item *string_list_append(struct string_list *list,
 			list->strdup_strings ? xstrdup(string) : (char *)string);
 }
 
+/* Yuck */
+static compare_strings_fn compare_for_qsort;
+
+/* Only call this from inside sort_string_list! */
 static int cmp_items(const void *a, const void *b)
 {
 	const struct string_list_item *one = a;
 	const struct string_list_item *two = b;
-	return strcmp(one->string, two->string);
+	return compare_for_qsort(one->string, two->string);
 }
 
 void sort_string_list(struct string_list *list)
 {
+	compare_for_qsort = list->cmp ? list->cmp : strcmp;
 	qsort(list->items, list->nr, sizeof(*list->items), cmp_items);
 }
 
@@ -246,8 +253,10 @@ struct string_list_item *unsorted_string_list_lookup(struct string_list *list,
 						     const char *string)
 {
 	int i;
+	compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
+
 	for (i = 0; i < list->nr; i++)
-		if (!strcmp(string, list->items[i].string))
+		if (!cmp(string, list->items[i].string))
 			return list->items + i;
 	return NULL;
 }
diff --git a/string-list.h b/string-list.h
index c50b0d0..446e79e 100644
--- a/string-list.h
+++ b/string-list.h
@@ -5,10 +5,14 @@ struct string_list_item {
 	char *string;
 	void *util;
 };
+
+typedef int (*compare_strings_fn)(const char *, const char *);
+
 struct string_list {
 	struct string_list_item *items;
 	unsigned int nr, alloc;
 	unsigned int strdup_strings:1;
+	compare_strings_fn cmp; /* NULL uses strcmp() */
 };
 
 #define STRING_LIST_INIT_NODUP { NULL, 0, 0, 0 }
-- 
1.8.1.304.gf036638

^ permalink raw reply related

* Re: [PATCH 01/10] list_lookup: create case and length search
From: Junio C Hamano @ 2013-01-07 22:37 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: git
In-Reply-To: <7vmwwnp84p.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Antoine Pelisse <apelisse@gmail.com> writes:
>
>> Create a new function to look-up a string in a string_list, but:
>>  - add a new parameter to ignore case differences
>>  - add a length parameter to search for a substring
>>
>> The idea is to avoid several copies (lowering a string before searching
>> it when we just want to ignore case), or copying a substring of a bigger
>> string to search it in the string_list
>>
>> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
>> ---
>
> I did not read beyond the log message and the function signature of
> the updated get_entry_index(), but it strikes me insane to build a
> sorted list using case sensitive full string comarison and then to
> look for an element in that list using a different comparison
> criteria (e.g. case insensitive comparison) and to expect the
> bisection search to still work.  Shouldn't the codepath that builds
> the string-list be sorting the list in case insensitive way from the
> beginning if this were to work correctly?
>
> It seems to suggest to me that this "are the keys case sensitive?"
> bit belongs to the entire struct string_list structure as its
> property (similar to the way "do the keys need to be strdup'ed?"
> bit), not something you would flip per invocation basis of the
> lookup function.
>
> Also isn't size_t an unsigned type?  What does it even mean to pass
> "-1" to it, and propagate it down to strncmp()?
>
> If you have a list sorted by a key, and if you want to query it with
> a partial prefix of a possibly valid key, I think you shouldn't have
> to add the "length search" at all. The existing look up function
> would return the location in the list that short key would have been
> inserted, which would come before the first entry that your short
> key is a prefix of, so the caller can iterate the list from there to
> find all entries.  In other words, if existing list has "aaa",
> "bba", and "bbc", and you want to grab entries that begin with "bb",
> you can ask for "bb" to the loop up function, which would say "the
> key does not exist in the list, but it would be inserted before this
> 'bba' entry".  Then you can discover that "bba" and "bbc" both
> matches the shortened key you have, no?

In the above, I mixed up which string is a prefix of which, but the
overall comment still stands, I think.

Your mailmap string list has "Ee@eee" (email part only) stored, while
the record from author/committer line has "NNNN <EE@EEE> TTTTTT sZZZZ",
to hold name, timestamp and zone.  You have a pointer email pointing
at the first "E" in that record, with maillen set to 6, and see if
the mailmap contains a string that match "EE@EEE".  So I mixed up
the prefix-ness of these strings.

Let's address the case sensitivety first.  The string_list_lookup()
and your string_list_lookup_extended() functions both work on a
string list that is built by add_mapping() function that asks
string_list_find_insert_index() to find an insertion point in the
sorted string list.  If this is not done using the same case
insensitive manner for a string list that is meant to be queried
case insensitively, the resulting list is still sorted case
sensitively, and look-up functions that work by binary search won't
find what you are looking for when they try to find the string using
case-insensitive comparison.  The change to implement it would
probably look like the attached patch.

Assuming that we can leave the case sensitivity behind us that way,
we still would need a new helper function

    string_list_lookup_prefix(struct string_list *, const char*, size_t)

so that you can find an existing "Ee@eee" entry in the string list
by email that points at "EE@EEE> TTTTTT sZZZZ" with maillen of 6.

I am wondering if we can do that without adding the length-limited
sort in the low-level get_entry_index() to do so, though.

You could first ask string_list_find_insert_index() to locate the
insertion point for "EE@EEE> TTTTTT sZZZZ".  That has to come after
"Ee@eee" if it exists (or it could be "De@eee" in which case you
know there is no match).

 1. Ask string_list_find_insert_index() to locate "EE@EEE> TTTTTT..."

    a. If it says there is an exact match, then that did not match
       "EE@EEE" list (you found "EE@EEE> TTTTTT sZZZZ").  You might
       still find "EE@EEE" before that record, though.  Also your
       key may have been "EE@EEE" exactly, in which case you already
       found what you were looking for.

    b. If it says there is not an exact match, the returned value is
       pointing at a record that comes after "EE@EEE> TTTTTT sZZZZ" if
       you were to insert it, which definitely is after "EE@EEE".

    So in either case, you know what you are looking for must come
    before string_list_find_insert_index() gave you.  Let's call
    that position "i".

    You still need to be careful that there could be an existing
    entry whose key is "EE@EEE" followed by a byte that sorts before
    ">" or " " (we do not want to limit this new generic string-list
    function to operate only on e-mail addresses), though.  So the
    above "must come before" is not "must come immediately before".

 2. Then compare map->items[i].string and email using strncasecmp()
    for maillen bytes, and make sure map->items[].string is exactly
    maillen bytes long.  If it matches, you found what you were
    looking for.  Otherwise, decrement "i" (e.g. the overlong key
    string you had was "EE@EEE> TTTTTT..." and the record at one
    before the insertion point was "EE@EEE\t" which sorts before it,
    but the string-list may have "EE@EEE" before that entry), see if
    the first maillen bytes still match, and continue.  Once you
    decrement i enough, the first maillen bytes won't match
    (e.g. you hit "De@EEE"), or "i" goes negative, and you can stop.

or something like that.  I'll cook up a patch and see how ugly it
ends up with.

-- >8 --
Subject: [PATCH] string-list: allow case-insensitive string list

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * At the very beginning of mailmap.c::read_mailmap(), you would say
   something like:

	int read_mailmap(struct string_list *map, char **repo_ident)
        {
        	map->strdup_strings = 1;
       +        map->cmp = strcasecmp;
                /* each failure returns 1, so >1 means both calls ...

   to make the mailmap string list a case insensitive one.

 string-list.c | 17 +++++++++++++----
 string-list.h |  4 ++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/string-list.c b/string-list.c
index 397e6cf..6f3d8cf 100644
--- a/string-list.c
+++ b/string-list.c
@@ -7,10 +7,11 @@ static int get_entry_index(const struct string_list *list, const char *string,
 		int *exact_match)
 {
 	int left = -1, right = list->nr;
+	compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
 
 	while (left + 1 < right) {
 		int middle = (left + right) / 2;
-		int compare = strcmp(string, list->items[middle].string);
+		int compare = cmp(string, list->items[middle].string);
 		if (compare < 0)
 			right = middle;
 		else if (compare > 0)
@@ -96,8 +97,9 @@ void string_list_remove_duplicates(struct string_list *list, int free_util)
 {
 	if (list->nr > 1) {
 		int src, dst;
+		compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
 		for (src = dst = 1; src < list->nr; src++) {
-			if (!strcmp(list->items[dst - 1].string, list->items[src].string)) {
+			if (!cmp(list->items[dst - 1].string, list->items[src].string)) {
 				if (list->strdup_strings)
 					free(list->items[src].string);
 				if (free_util)
@@ -230,15 +232,20 @@ struct string_list_item *string_list_append(struct string_list *list,
 			list->strdup_strings ? xstrdup(string) : (char *)string);
 }
 
+/* Yuck */
+static compare_strings_fn compare_for_qsort;
+
+/* Only call this from inside sort_string_list! */
 static int cmp_items(const void *a, const void *b)
 {
 	const struct string_list_item *one = a;
 	const struct string_list_item *two = b;
-	return strcmp(one->string, two->string);
+	return compare_for_qsort(one->string, two->string);
 }
 
 void sort_string_list(struct string_list *list)
 {
+	compare_for_qsort = list->cmp ? list->cmp : strcmp;
 	qsort(list->items, list->nr, sizeof(*list->items), cmp_items);
 }
 
@@ -246,8 +253,10 @@ struct string_list_item *unsorted_string_list_lookup(struct string_list *list,
 						     const char *string)
 {
 	int i;
+	compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
+
 	for (i = 0; i < list->nr; i++)
-		if (!strcmp(string, list->items[i].string))
+		if (!cmp(string, list->items[i].string))
 			return list->items + i;
 	return NULL;
 }
diff --git a/string-list.h b/string-list.h
index c50b0d0..446e79e 100644
--- a/string-list.h
+++ b/string-list.h
@@ -5,10 +5,14 @@ struct string_list_item {
 	char *string;
 	void *util;
 };
+
+typedef int (*compare_strings_fn)(const char *, const char *);
+
 struct string_list {
 	struct string_list_item *items;
 	unsigned int nr, alloc;
 	unsigned int strdup_strings:1;
+	compare_strings_fn cmp; /* NULL uses strcmp() */
 };
 
 #define STRING_LIST_INIT_NODUP { NULL, 0, 0, 0 }
-- 
1.8.1.304.gf036638

^ permalink raw reply related

* Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.
From: Eric S. Raymond @ 2013-01-07 20:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Heiko Voigt, git
In-Reply-To: <7vehhwiyt6.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com>:
> In the longer term, if parsecvs is revived either by Eric or
> somebody else, we will add the mention back to the documentation,
> probably with an updated URL.

I'm working on the revival right now. Repository generation is still
broken, and likely to remain so until I can make the export-stream stage
work, but just a few minutes ago I coaxed it into generating what looks 
like graphviz markup describing a commit graph on standard output.

Even though dot(1) barfs on the markup, this is encouraging. It almost
certainly means that the analysis and parsing stages aren't broken, and
by stubbing out enough functions I can figure out what is being passed
to the broken repository-maker well enough for my purposes.

Actually, I've already figured out how to generate blob and commit-header
markup.  The hard part is generating fileops; I don't quite understand
the generated data structures well enough to do that yet.  But I'm
making progress, and feeling more optimistic than I was yesterday.

In related news, I've sent Michael Haggerty patches that fix the visible
problems with cvs2git that I enumerated in previous mail.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply

* Re: [PATCH v2] status: always report ignored tracked directories
From: Antoine Pelisse @ 2013-01-07 20:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Torsten Bögershausen, git
In-Reply-To: <7vip78izir.fsf@alter.siamese.dyndns.org>

> Here is my attempt...
>
>     When enumerating paths that are ignored, paths the index knows
>     about are not included in the result.  The "index knows about"
>     check is done by consulting the name hash, not the actual
>     contents of the index:
>
>      - When core.ignorecase is false, directory names are not in the
>        name hash, and ignored ones are shown as ignored (directories
>        can never be tracked anyway).
>
>      - When core.ignorecase is true, however, the name hash keeps
>        track of the names of directories, in order to detect
>        additions of the paths under different cases.  This causes
>        ignored directories to be mistakenly excluded when
>        enumerating ignored paths.
>
>     Stop excluding directories that are in the name hash when
>     looking for ignored files in dir_add_name(); the names that are
>     actually in the index are excluded much earlier in the callchain
>     in treat_file(), so this fix will not make them mistakenly
>     identified as ignored.
>
>     Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
>     Reviewed-by: Jeff King <peff@peff.net>
>     Signed-off-by: Junio C Hamano <gitster@pobox.com>

I like it a lot, thanks to both of you

^ permalink raw reply

* The tip of 'next' has been rewound.
From: Junio C Hamano @ 2013-01-07 20:04 UTC (permalink / raw)
  To: git

Just to let you contributors know, your next "git fetch" will notice
that remotes/origin/next has been rewound.  Two topics have been
kicked back to 'pu', and the branch has been reordered somewhat.

I'll start merging bugfix topics that have graduated to master since
v1.8.1 to maint so that we can have v1.8.1.1 sometime next week.  We
haven't seen any real regression report for v1.8.1 since tagging it,
and it has been a week, so this may be the first 'maint' release
without "brown paper bag regression fix" for quite some time ;-)

^ permalink raw reply

* Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.
From: Junio C Hamano @ 2013-01-07 19:21 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Eric S. Raymond, git
In-Reply-To: <20130106163420.GA3378@book-mint>

Heiko Voigt <hvoigt@hvoigt.net> writes:

> Hi,
>
> On Fri, Dec 28, 2012 at 11:42:00PM -0500, Eric S. Raymond wrote:
>> Heiko Voigt <hvoigt@hvoigt.net>:
>> > Maybe you could add that information to the parsecvs compile
>> > instructions? I think just because it takes some effort to compile does
>> > not justify to remove this useful pointer here. When I was converting a
>> > legacy cvs repository this pointer would have helped me a lot.
>> 
>> I'm parsecvs's maintainer now.  It's not in good shape; there is at
>> least one other known showstopper besides the build issue.  I would
>> strongly prefer to direct peoples' attention away from it until I
>> have time to fix it and cut a release.  This is not a distant 
>> prospect - two or three weeks out, maybe.
>
> So for this short amount of time you want to change gits documentation?
> Is this hint causing you trouble? Are there many people asking for
> support because of that?
>
> Even if it takes some effort to get parsecvs running I would like to
> keep the hint to a good and proven cvs importer.

I do not mind changing the documentation, but now I re-read the
change, I tend to agree that dropping the last (un)maintained
version of parsecvs may be detrimental.  Most people will not
download parsecvs from keith's page, but first will try the one
shipped with their distros, and the recent maintainer change for
that tool would not have any impact to these users.

But at the same time, by the time this change reaches these users
who may benefit by the mention of parsecvs via the distro route, the
situation may be different, so I do not think it is such a big deal.
In the longer term, if parsecvs is revived either by Eric or
somebody else, we will add the mention back to the documentation,
probably with an updated URL.

^ permalink raw reply

* [PATCH] Prevent space after directories in tcsh completion
From: Marc Khouzam @ 2013-01-07 19:07 UTC (permalink / raw)
  To: 'git@vger.kernel.org'
  Cc: 'Junio C Hamano', 'felipe.contreras@gmail.com',
	'szeder@ira.uka.de'

If git-completion.bash returns a single directory as a completion,
tcsh will automatically add a space after it, which is not what the
user wants.

This commit prevents tcsh from doing this.

Also, a check is added to make sure the tcsh version used is recent
enough to allow completion to work as expected.

Signed-off-by: Marc Khouzam <marc.khouzam@ericsson.com>
---

This update is meant to have tcsh completion work better if the
feature "git-completion.bash: add support for path completion"
is accepted.
See http://www.mail-archive.com/git@vger.kernel.org/msg14137.html
This commit does not depend on that other feature though and can
be applied right away.

Furthermore, based on feedback I received, some users are running
versions of tcsh that are over 5 years old and don't provide the
proper support for this script.  I've added a check to let the user
know of such (sad) situation nicely.

Thanks

Marc

 contrib/completion/git-completion.tcsh | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/contrib/completion/git-completion.tcsh b/contrib/completion/git-completion.tcsh
index 8aafb63..3e3889f 100644
--- a/contrib/completion/git-completion.tcsh
+++ b/contrib/completion/git-completion.tcsh
@@ -13,6 +13,7 @@
 #
 # To use this completion script:
 #
+#    0) You need tcsh 6.16.00 or newer.
 #    1) Copy both this file and the bash completion script to ${HOME}.
 #       You _must_ use the name ${HOME}/.git-completion.bash for the
 #       bash script.
@@ -24,6 +25,15 @@
 #        set autolist=ambiguous
 #       It will tell tcsh to list the possible completion choices.
 
+set __git_tcsh_completion_version = `\echo ${tcsh} | \sed 's/\./ /g'`
+if ( ${__git_tcsh_completion_version[1]} < 6 || \
+     ( ${__git_tcsh_completion_version[1]} == 6 && \
+       ${__git_tcsh_completion_version[2]} < 16 ) ) then
+	echo "git-completion.tcsh: Your version of tcsh is too old, you need version 6.16.00 or newer.  Git completion will not work."
+	exit
+endif
+unset __git_tcsh_completion_version
+
 set __git_tcsh_completion_original_script = ${HOME}/.git-completion.bash
 set __git_tcsh_completion_script = ${HOME}/.git-completion.tcsh.bash
 
@@ -64,9 +74,7 @@ fi
 _\${1}
 
 IFS=\$'\n'
-if [ \${#COMPREPLY[*]} -gt 0 ]; then
-	echo "\${COMPREPLY[*]}" | sort | uniq
-else
+if [ \${#COMPREPLY[*]} -eq 0 ]; then
 	# No completions suggested.  In this case, we want tcsh to perform
 	# standard file completion.  However, there does not seem to be way
 	# to tell tcsh to do that.  To help the user, we try to simulate
@@ -85,19 +93,20 @@ else
 		# We don't support ~ expansion: too tricky.
 		if [ "\${TO_COMPLETE:0:1}" != "~" ]; then
 			# Use ls so as to add the '/' at the end of directories.
-			RESULT=(\`ls -dp \${TO_COMPLETE}* 2> /dev/null\`)
-			echo \${RESULT[*]}
-
-			# If there is a single completion and it is a directory,
-			# we output it a second time to trick tcsh into not adding a space
-			# after it.
-			if [ \${#RESULT[*]} -eq 1 ] && [ "\${RESULT[0]: -1}" == "/" ]; then
-				echo \${RESULT[*]}
-			fi
+			COMPREPLY=(\`ls -dp \${TO_COMPLETE}* 2> /dev/null\`)
 		fi
 	fi
 fi
 
+# tcsh does not automatically remove duplicates, so we do it ourselves
+echo "\${COMPREPLY[*]}" | sort | uniq
+
+# If there is a single completion and it is a directory, we output it
+# a second time to trick tcsh into not adding a space after it.
+if [ \${#COMPREPLY[*]} -eq 1 ] && [ "\${COMPREPLY[0]: -1}" == "/" ]; then
+	echo "\${COMPREPLY[*]}"
+fi
+
 EOF
 
 # Don't need this variable anymore, so don't pollute the users environment
-- 
1.8.1.367.g8e14972.dirty

^ permalink raw reply related

* Re: [PATCH v2] status: always report ignored tracked directories
From: Junio C Hamano @ 2013-01-07 19:06 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: Jeff King, tboegi, git
In-Reply-To: <1357510179-22852-1-git-send-email-apelisse@gmail.com>

Antoine Pelisse <apelisse@gmail.com> writes:

First, thanks for working on this.

The explanation is a bit confusing, especially for people like me,
as it does not make it clear that there are two kinds of "paths in
the index".  Files can be added to the index ("git add" and then
shown via "ls-files") and these are what we usually call "files in
the index".  But there is a separate "name hash" that keeps track of
"paths the index knows about" in play, and that is what is discussed
in the description.

> Tracked directories (i.e. directories containing tracked files)
> that are ignored must be reported as ignored if they contain
> untracked files.
>
> Currently, files in the index can't be reported as ignored and are
> automatically dropped from the list:

"file in the index" will never be ignored, period.  Some paths the
index knows about, namely, directory names in name hash, may need to
be reported as ignored.

It is not clear what "list" the above "dropped from the list" refers
to.  The list of paths that becomes the result of "status"?

>  - When core.ignorecase is false, directories (which are not directly
>  tracked) are not listed as part of the index, and the directory can be
>  shown as ignored.
>
>  - When core.ignorecase is true on the other hand, directories are
>  reported as part of the index, and the directory is dropped, thus not
>  displayed as ignored.
>
> Fix that behavior by allowing indexed file to be added when looking for
> ignored files.
>
>  - Ignored tracked and untracked directories are treated the same way
>  when looking for ignored files, so we should not care about their index
>  status.
>  - Files are dismissed by treat_file() if they belong to the
>  index, so that step would have been a no-op anyway.

Here is my attempt...

    When enumerating paths that are ignored, paths the index knows
    about are not included in the result.  The "index knows about"
    check is done by consulting the name hash, not the actual
    contents of the index:

     - When core.ignorecase is false, directory names are not in the
       name hash, and ignored ones are shown as ignored (directories
       can never be tracked anyway).

     - When core.ignorecase is true, however, the name hash keeps
       track of the names of directories, in order to detect
       additions of the paths under different cases.  This causes
       ignored directories to be mistakenly excluded when
       enumerating ignored paths.

    Stop excluding directories that are in the name hash when
    looking for ignored files in dir_add_name(); the names that are
    actually in the index are excluded much earlier in the callchain
    in treat_file(), so this fix will not make them mistakenly
    identified as ignored.

    Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
    Reviewed-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff --git a/dir.c b/dir.c
index 9b80348..f836590 100644
--- a/dir.c
+++ b/dir.c
@@ -672,7 +672,8 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len)
 
 static struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len)
 {
-	if (cache_name_exists(pathname, len, ignore_case))
+	if (!(dir->flags & DIR_SHOW_IGNORED) &&
+	    cache_name_exists(pathname, len, ignore_case))
 		return NULL;
 
 	ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
@@ -877,11 +878,7 @@ static int treat_file(struct dir_struct *dir, struct strbuf *path, int exclude,
 	if (exclude)
 		exclude_file = !(dir->flags & DIR_SHOW_IGNORED);
 	else if (dir->flags & DIR_SHOW_IGNORED) {
-		/*
-		 * Optimization:
-		 * Don't spend time on indexed files, they won't be
-		 * added to the list anyway
-		 */
+		/* Always exclude indexed files */
 		struct cache_entry *ce = index_name_exists(&the_index,
 		    path->buf, path->len, ignore_case);
 

^ permalink raw reply related

* [PATCH] git clone depth of 0 not possible.
From: Stefan Beller @ 2013-01-07 18:06 UTC (permalink / raw)
  To: schlotter, gitster, Ralf.Wildenhues, git; +Cc: Stefan Beller

Currently it is not possible to have a shallow depth of
just 0, i.e. only one commit in that repository after cloning.
The minimum number of commits is 2, caused by depth=1.

I had no good idea how to add this behavior to git clone as
the depth variable in git_transport_options struct (file transport.h)
uses value 0 for another meaning, so it would have need changes at
all places, where the transport options depth is being used 
(e.g. fetch)

So I documented the current behavior, see attached patch.

Stefan Beller (1):
  Documentation on depth option in git clone.

 Documentation/git-clone.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
1.8.1.80.g3e293fb.dirty

^ permalink raw reply

* Re: [PATCH] refs: do not use cached refs in repack_without_ref
From: Martin Fick @ 2013-01-07 18:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, git, Junio C Hamano
In-Reply-To: <201301071109.12086.mfick@codeaurora.org>

...[Sorry about the previous HTML reposts]

Jeff King <peff@peff.net> wrote:
>On Mon, Dec 31, 2012 at 03:30:53AM -0700, Martin Fick 
wrote:
>
>> The general approach is to setup a transaction and
>> either commit or abort it. A transaction can be setup
>> by renaming an appropriately setup directory to the
>> "ref.lock" name. If the rename succeeds, the
>> transaction is begun. Any actor can abort the
>> transaction (up until it is committed) by simply
>> deleting the "ref.lock" directory, so it is not at
>> risk of going stale.
>
> Deleting a directory is not atomic, as you first have
> to remove the contents, putting it into a potentially
> inconsistent state. I'll assume you deal with that
> later...

Right, these simple single file transactions have at 
best 1 important file/directory in them, once deleted 
the transaction is aborted (can no longer complete).  
However to support multi file transactions, a better 
approach is likely to rename the uuid directory to have 
a .delete extension before deleting stuff in it.

> > One important piece of the transaction is the use
> > of uuids. The uuids provide a mechanism to tie the
> > atomic commit pieces to the transactions and thus to
> > prevent long sleeping process from inadvertently
> > performing actions which could be out of date when
> > they wake finally up.
> >
>Has this been a problem for you in practice?

No, but as you say, we don't currently hold locks for 
very long. I anticipate it being a problem in a 
clustered environment when transactions start spanning 
repos from java processes, with insane amounts of RAM, 
which can sometimes have unpredictable indeterminately 
long java GC cycles at inopportune times.. It would seem 
short sighted if Gerrit at least did not assume this 
will be a problem.

But, deletes today in git are not so short and Michael's 
fixes may make things worse? But, as you point out, that 
should perhaps be solved a different way.

> Avoiding this is one of the reasons that git does not
> take out long locks; instead, it takes the lock only
> at the moment it is ready to write, and aborts if it
> has been updated since the longer-term operation
> began. This has its own problems (you might do a lot
> of work only to have your operation aborted), but I
> am not sure that your proposal improves on that.

It does not, it might increase this.

> Git typically holds ref locks for a few syscalls. If
> you are conservative about leaving potentially stale
> locks in place (e.g., give them a few minutes to
> complete before assuming they are now bogus), you will
> not run into that problem.

In a distributed environment even a few minutes might 
not be enough, processes could be on a remote server 
with a temporarily split network, that could cause 
delays longer than your typical local expectations.

But there is also the other piece of this problem, how 
do you detect stale locks? How long will it be stale 
until a user figures it out and reports it? How many 
other users will simply have failed pushes and wonder 
why without reporting them?

> > In each case, the atomic commit piece is the
> > renaming of a file. For the create and update
> > pieces, a file is renamed from the "ref.lock" 
> > dir to the "ref" file resulting in an update to 
> > the sha for the ref.
>
> I think we've had problems with cross-directory
> renames on some filesystems, but I don't recall the
> details. I know that Coda does not like 
> cross-directory links, but cross-directory renames 
> are OK (and in fact we fall back to the latter when
> the former does not work).
>
> Ah, here we go: 5723fe7 (Avoid cross-directory renames
> and linking on object creation, 2008-06-14). Looks
> like NFS is the culprit.

If the renames fail we can fall back to regular file 
locking, the hard part to detect and deal with would be 
if the renames don't fail but become copies/mkdirs.

>> In the case of a delete, the actor may verify that
>> "ref" currently contains the sha to "prune" if it
>> needs to, and then renames the "ref" file to
>> "ref.lock/uuid/delete". On success, the ref was
>> deleted.
>>
>> Whether successful or not, the actor may now simply
>> delete the "ref.lock" directory, clearing the way for
>> a new transaction. Any other actor may delete this
>> directory at any time also, likely either on conflict
>> (if they are attempting to initiate a transaction),
>> or after a grace period just to cleanup the FS. Any
>> actor may also safely cleanup the tmp directories,
>> preferably also after a grace period.
>
> Hmm. So what happens to the "delete" file when the
> ref.lock directory is being deleted? Presumably
> deleting the ref.lock directory means doing it
> recursively (which is non-atomic). But then why are 
> we keeping the delete file at all, if we're just about
> to remove it?

We are not trying to keep it, but we need to ensure that 
our transaction has not yet been aborted: the rename 
does this.  If we just deleted the file, we may sleep 
and another transaction may abort our transaction and 
complete before we wake up and actually delete the file. 
But by using a rename we tie the delete atomically to 
the transaction, it cannot succeed if our transaction 
was aborted during a sleep since the directory we are 
renaming the file into would be gone! This is sort of 
the magic piece that makes the whole scheme special, 
safe deletes tend to be the hardest part.

>What happens if another process wants to cancel a
> transaction that is partially done? That is, the
> ref.lock directory is created, but it contains the
> uuid subdir? It sounds like it's OK to just delete
> from it, and the original process will then fail at
> its rename?

Yes, exactly.  But again, it might be better to cause a 
rename of the uuid dir before deleting it.

>> One neat part about this scheme is that I believe it
>> would be backwards compatible with the current
>> locking mechanism since the transaction directory
>> will simply appear to be a lock to older clients. 
>> And the old lock file should continue to lock
>> out these newer transactions.
>
>Yeah, I don't see anything that would prevent that. 
> The current code only cares about open("$ref.lock",
> O_EXCL). But that should be correct and atomic with
> respect to the renamed directories. You are depending
> on atomic directory renames. Those aren't used
> anywhere yet in git, as far as I know. So that may run
> into problems (but there's no reason this system
> couldn't be optional for filesystems that are more
> abled, and other systems could fall back to the
> straight-file locking).

Right.

> So in response to your question, no, I don't see any
> real showstoppers here. 

Alright, cool, thanks for the review and analysis, I 
appreciate it.

> And unlike the "all refs are files in a directory"
> scheme, it's confined to writing, which solves the
> readdir() atomicity questions I had there.

Right, I couldn't see a way around those, I think it was 
inherently flawed.

> I still can't say I'm super excited about it, just
> because it seems like a solution in search of a
> problem that I have not experienced myself.

It might be.  But at least a solution is out there now. 
If time indicates that it is needed, I feel better 
knowing there is a solution. 

Murphy has a way of making unlikely problems real 
annoyances in automated distributed environments. The 
problem is not usually one real annoying problem, those 
get fixed. The problem we deal mostly with is 1000 
infrequent problems, all different, but they add up to a 
system which needs constant maintenance. And the 
maintenance is hard since each problem is different, the 
analysis is difficult and requires expertise.

> Fixing the minor remaining races on ref packing would
> be nice, but I do not think those are a problem of the
> on-disk lock representation. The reason they are not
> fixed now is from an attempt to keep lock contention
> low between unrelated updates (something which we
> should probably give up on in favor of correctness;
> but that is orthogonal to whether we do it with the
> existing file locks or with a new system).

Agreed.

>A much stronger fix for that would be to record deletes
> in the loose ref storage so that we don't have to
> update the packed-refs file as often (because it is
> inherently a lock bottleneck, and because it is
> wasteful when you have a very large number of refs).
> But that means dealing with directory/file conflicts
> between deleted and existing branches, which is a
> pain.

Makes sense.

Thanks again for your thoughts,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply

* [PATCH] Documentation on depth option in git clone.
From: Stefan Beller @ 2013-01-07 18:10 UTC (permalink / raw)
  To: gitster, schlotter, Ralf.Wildenhues, git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Documentation/git-clone.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 7fefdb0..e76aa50 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -186,7 +186,8 @@ objects from the source repository into a pack in the cloned repository.
 	it, nor push from nor into it), but is adequate if you
 	are only interested in the recent history of a large project
 	with a long history, and would want to send in fixes
-	as patches.
+	as patches. The depth should be at least 1. If it is 0 or
+	below, the cloned repository will not be shallow.
 
 --single-branch::
 	Clone only the history leading to the tip of a single branch,
-- 
1.8.1.80.g3e293fb.dirty

^ permalink raw reply related

* Re: [PATCH 1/4] test: Add target test-lint-shell-syntax
From: Junio C Hamano @ 2013-01-07 18:07 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Jeff King, git
In-Reply-To: <50EB0928.3090901@web.de>

Torsten Bögershausen <tboegi@web.de> writes:

> Sorry for late answer, but there is a problem (both linux and Mac OS X) :-(
> $ make test-lint does not do shel syntax check, neither
> $ make test-lint-shell-syntax

In which directory?

    $ make -C t test-lint-shell-syntax
    ... passes silently ...
    $ ed t/t0000-basic.sh
    /test_expect_success/
    a
            which sh
    .
    w
    q
    $ make -C t test-lint-shell-syntax
    t0000-basic.sh:28: error: which is not portable (please use type):      which sh
    make: *** [test-lint-shell-syntax] Error 1

If you edit out '@' (but nothing else) from this line:

> 	@'$(PERL_PATH_SQ)' check-non-portable-shell.pl $(T)

and run the above again, you would see that it is running this shell
command:

'/usr/bin/perl' check-non-portable-shell.pl t0000-basic.sh t0001-init.sh ...

If you introduce a Perl syntax error to check-non-portable-shell.pl,
like this, you will get:

    $ make -C t test-lint-shell-syntax
    syntax error at check-non-portable-shell.pl line 11, near "whoa

So... is your shell broken?  The above seems to work for dash, bash,
ksh and zsh.

^ permalink raw reply

* [PATCH] Documentation on depth option in git clone.
From: Stefan Beller @ 2013-01-07 18:06 UTC (permalink / raw)
  To: schlotter, gitster, Ralf.Wildenhues, git; +Cc: Stefan Beller
In-Reply-To: <1357581996-17505-1-git-send-email-stefanbeller@googlemail.com>

---
 Documentation/git-clone.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 7fefdb0..e76aa50 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -186,7 +186,8 @@ objects from the source repository into a pack in the cloned repository.
 	it, nor push from nor into it), but is adequate if you
 	are only interested in the recent history of a large project
 	with a long history, and would want to send in fixes
-	as patches.
+	as patches. The depth should be at least 1. If it is 0 or
+	below, the cloned repository will not be shallow.
 
 --single-branch::
 	Clone only the history leading to the tip of a single branch,
-- 
1.8.1.80.g3e293fb.dirty

^ permalink raw reply related

* Re: [PATCH v2] status: always report ignored tracked directories
From: Junio C Hamano @ 2013-01-07 17:50 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Antoine Pelisse, Nguyen Thai Ngoc Duy, Jeff King, git
In-Reply-To: <50EB0409.1090307@web.de>

Torsten Bögershausen <tboegi@web.de> writes:

> The bad news: the patch does not apply.

Huh, isn't this already queued as 22ccf86 (status: always report
ignored tracked directories, 2013-01-06)?

> The good news: t7061 passes on pu,
> and dir.c seems to be changes as needed:
>
> commit 1f4e17c6c9833f17dc6bbf045f8a8d6378dcb417
> Merge: dee1fa4 cc37e5b
> Author: Junio C Hamano <gitster@pobox.com>
> Date: Sun Jan 6 23:46:29 2013 -0800
>
> Merge branch 'nd/parse-pathspec' into pu
>
> which comes from Duy:
>
> commit cc37e5bf18ca11d9a884bddfebcdff61df3e6279
> Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Date: Sun Jan 6 13:21:08 2013 +0700
>
> Convert more init_pathspec() to parse_pathspec()

Yes, it needs conflict resolution with other topics in flight, but
the thing to test is to see if the result of merging 22ccf86 into
the 'master' branch does what we want; the newer topic is still in
flux and we know it will be rerolled before it gets into testable
shape.

^ permalink raw reply

* Re: [PATCH 1/4] test: Add target test-lint-shell-syntax
From: Torsten Bögershausen @ 2013-01-07 17:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Torsten Bögershausen, git
In-Reply-To: <7vpq1nyvp1.fsf@alter.siamese.dyndns.org>

On 03.01.13 01:08, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> I would actually not add this to TEST_LINT by default, especially
>> when "duplicates" and "executable" that are much simpler and less
>> likely to hit false positives are not on by default.
>>
>> At least, a change to add this to TEST_LINT by default must wait to
>> be merged to any integration branch until all the fix-up topics that
>> this test triggers in the current codebase graduate to the branch.
>>
>>>> +test-lint-shell-syntax:
>>>> +	$(PERL_PATH) check-non-portable-shell.pl $(T)
>>>
>>> This is wrong if $(PERL_PATH) contains spaces, no?
>>
>> You are correct; "harness" thing in the same Makefile gets this
>> wrong, too.  I think the right invocation is:
>>
>> 	@'$(PERL_PATH_SQ)' check-non-portable.shell.pl $(T)
>>
>> although I do not offhand know if that symbol is already exported by
>> the top-level Makefile.
> 
> I'll tentatively queue this instead.  The log message has also been
> cleaned up a bit.

Sorry for late answer, but there is a problem (both linux and Mac OS X) :-(
$ make test-lint does not do shel syntax check, neither
$ make test-lint-shell-syntax

In the Makefile the the line 
	@'$(PERL_PATH_SQ)' check-non-portable-shell.pl $(T)
doesn't seem to anything (?)

Replacing @'$(PERL_PATH_SQ)' with $(PERL_PATH) gives the following,
expected result: (a very long line starting like this:)

$ make test-lint-shell-syntax
/usr/bin/perl check-non-portable-shell.pl t0000-basic.sh ......

confused...
/Torsten

^ permalink raw reply

* Re: [PATCH v2] status: always report ignored tracked directories
From: Torsten Bögershausen @ 2013-01-07 17:21 UTC (permalink / raw)
  To: Antoine Pelisse, Nguyen Thai Ngoc Duy; +Cc: Jeff King, tboegi, git
In-Reply-To: <1357510179-22852-1-git-send-email-apelisse@gmail.com>

On 06.01.13 23:09, Antoine Pelisse wrote:
[snip]
> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
> ---
>  dir.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 9b80348..f836590 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -672,7 +672,8 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len)
>
>  static struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len)
>  {
> -	if (cache_name_exists(pathname, len, ignore_case))
> +	if (!(dir->flags & DIR_SHOW_IGNORED) &&
> +	    cache_name_exists(pathname, len, ignore_case))
>  		return NULL;
>
>  	ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
> @@ -877,11 +878,7 @@ static int treat_file(struct dir_struct *dir, struct strbuf *path, int exclude,
>  	if (exclude)
>  		exclude_file = !(dir->flags & DIR_SHOW_IGNORED);
>  	else if (dir->flags & DIR_SHOW_IGNORED) {
> -		/*
> -		 * Optimization:
> -		 * Don't spend time on indexed files, they won't be
> -		 * added to the list anyway
> -		 */
> +		/* Always exclude indexed files */
>  		struct cache_entry *ce = index_name_exists(&the_index,
>  		    path->buf, path->len, ignore_case);
>
> --
> 1.7.12.4.3.g90f5e2d
>
The bad news: the patch does not apply.
The good news: t7061 passes on pu,
and dir.c seems to be changes as needed:

commit 1f4e17c6c9833f17dc6bbf045f8a8d6378dcb417
Merge: dee1fa4 cc37e5b
Author: Junio C Hamano <gitster@pobox.com>
Date: Sun Jan 6 23:46:29 2013 -0800

Merge branch 'nd/parse-pathspec' into pu

which comes from Duy:

commit cc37e5bf18ca11d9a884bddfebcdff61df3e6279
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sun Jan 6 13:21:08 2013 +0700

Convert more init_pathspec() to parse_pathspec()

^ permalink raw reply

* Re: [PATCH 1/8] Use %B for Split Subject/Body
From: Junio C Hamano @ 2013-01-07 16:53 UTC (permalink / raw)
  To: 郑文辉(Techlive Zheng); +Cc: David A. Greene, git
In-Reply-To: <CAPYzjrT_8g26y-QrYvbQYoySWskGdn15jCX60rz04wQFQ2ikVw@mail.gmail.com>

"郑文辉(Techlive Zheng)"  <techlivezheng@gmail.com> writes:

> Though, this patch defintely should be merged, becuase no one expects
> his commit message be altered durging the splitting process.

Are you saying that after double-checking what was posted?  It said
something like this below, which does not look like 'definitely
should be' to me.

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 920c664..f2b6d4a 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -296,7 +296,12 @@ copy_commit()
 	# We're going to set some environment vars here, so
 	# do it in a subshell to get rid of them safely later
 	debug copy_commit "{$1}" "{$2}" "{$3}"
+	# Use %B rather than %s%n%n%b to handle the special case of a
+	# commit that only has a subject line.  We don't want to
+	# introduce a newline after the subject, causing generation of
+	# a new hash.
 	git log -1 --pretty=format:'%an%n%ae%n%ad%n%cn%n%ce%n%cd%n%s%n%n%b' "$1" |
+#	git log -1 --pretty=format:'%an%n%ae%n%ad%n%cn%n%ce%n%cd%n%B' "$1" |
 	(
 		read GIT_AUTHOR_NAME
 		read GIT_AUTHOR_EMAIL

^ permalink raw reply related

* Re: [PATCH 4/4] t5002: check if unzip supports symlinks
From: René Scharfe @ 2013-01-07 16:50 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git discussion list, Junio C Hamano
In-Reply-To: <20130107085206.GI27909@elie.Belkin>

Am 07.01.2013 09:52, schrieb Jonathan Nieder:
> René Scharfe wrote:
>
>> Only add a symlink to the repository if both the filesystem and
>> unzip support symlinks.  To check the latter, add a ZIP file
>> containing a symlink, created like this with InfoZIP zip 3.0:
>>
>> 	$ echo sample text >textfile
>> 	$ ln -s textfile symlink
>> 	$ zip -y infozip-symlinks.zip textfile symlink
>
> Hm.  Do some implementations of "unzip" not support symlinks, or is
> the problem that some systems build Info-ZIP without the SYMLINKS
> option?

The unzip supplied with NetBSD 6.0.1, which is based on libarchive, 
doesn't support symlinks.  It creates a file with the link target path 
as its only content for such entries.

I assume that Info-ZIP is compiled with the SYMLINKS option on all 
platforms whose default filesystem supports symbolic links.  Except on 
Windows perhaps, where it's complicated.

For the test script there is no difference: If we don't have a tool to 
verify symlinks in archives, we better skip that part.

René

^ permalink raw reply

* Re: [PATCH 2/4] t0024, t5000: use test_lazy_prereq for UNZIP
From: René Scharfe @ 2013-01-07 16:28 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git discussion list, Junio C Hamano
In-Reply-To: <20130107084509.GH27909@elie.Belkin>

Am 07.01.2013 09:45, schrieb Jonathan Nieder:
> René Scharfe wrote:
>
>> --- a/t/t0024-crlf-archive.sh
>> +++ b/t/t0024-crlf-archive.sh
>> @@ -5,6 +5,11 @@ test_description='respect crlf in git archive'
>>   . ./test-lib.sh
>>   GIT_UNZIP=${GIT_UNZIP:-unzip}
>>
>> +test_lazy_prereq UNZIP '
>> +	"$GIT_UNZIP" -v >/dev/null 2>&1
>> +	test $? -ne 127
>
> Micronit: now that this is part of a test, there is no more need to
> silence its output.  The "unzip -v" output could be useful to people
> debugging with "t0024-crlf-archive.sh -v -i".

Oh, yes, good point.

> With or without that change, this is a nice cleanup and obviously
> correct, so
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks,
René

^ permalink raw reply

* Re: [PATCH 1/4] t0024, t5000: clear variable UNZIP, use GIT_UNZIP instead
From: René Scharfe @ 2013-01-07 16:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git discussion list, Junio C Hamano
In-Reply-To: <20130107051609.GB27909@elie.Belkin>

Am 07.01.2013 06:16, schrieb Jonathan Nieder:
> René Scharfe wrote:
>
>> InfoZIP's unzip takes default parameters from the environment variable
>> UNZIP.  Unset it in the test library and use GIT_UNZIP for specifying
>> alternate versions of the unzip command instead.
>>
>> t0024 wasn't even using variable for the actual extraction.  t5000
>> was, but when setting it to InfoZIP's unzip it would try to extract
>> from itself (because it treats the contents of $UNZIP as parameters),
>> which failed of course.
>
> That would only happen if the UNZIP variable was already exported,
> right?

We don't want any parameters a user may have been specified influence 
the test.  I'm not sure if someone actually sets that variable for that 
purpose, though.

My main use case is running individual test scripts with an alternative 
unzip binary, and with the patch this works as expected:

	$ cd t
	$ GIT_UNZIP=/usr/pkg/bin/unzip ./t5000-tar-tree.sh

> The patch makes sense and takes care of all uses of ${UNZIP} I can
> find, and it even makes the quoting consistent so a person can put
> their copy of unzip under "/Program Files".  For what it's worth,
>
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks!

René

^ permalink raw reply

* Re: Moving (renaming) submodules, recipe/script
From: Junio C Hamano @ 2013-01-07 16:08 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jonathan Nieder, W. Trevor King, Git, Peter Collingbourne,
	mbranchaud, Michael J Gruber
In-Reply-To: <50EA84E9.9030702@web.de>

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Right, and me thinks that would warrant a --force option for deinit
> to do that even if the submodule contains local changes (which would
> make deinit fail otherwise).

Probably.

> Additionally Michael and Marc spoke up
> that they would rather have a --all option to deinit all initialized
> submodules and "git submodule deinit" without any arguments should
> just produce a usage message. As I saw no voices against it that'll
> be part of the next iteration too.

Yeah, I forgot about that possible surprise of deiniting everything
under the sun by default.

I am not sure if "--all" is a good way forward, though.

Can you defeat it with "git submodule deinit ./--all" or something
to limit the target only to one submodule whose unfortunate location
is named as such?  If you have such a support, I have this suspicion
that you already get a short and explicit way to say "everything
under the current directory" with "git submodule deinit ." for free.

^ permalink raw reply

* Re: [PATCH] Add getenv.so for catching invalid getenv() use via LD_PRELOAD
From: David Michael @ 2013-01-07 15:45 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git@vger.kernel.org, Junio C Hamano
In-Reply-To: <1357376146-7155-1-git-send-email-pclouds@gmail.com>

Hi,

On Sat, Jan 5, 2013 at 3:55 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>  Perhaps this will help the getenv bug hunting (I assume we do the
>  hunting on Linux platform only). So far it catches this and is stuck
>  at getenv in git_pager().

For the record: I have been testing a macro pointing getenv at an
alternate implementation for the purpose of my port.  This alternate
function will return a unique pointer for each variable, as opposed to
using the same static buffer.  IBM considers this function a
proprietary z/OS extension (whereas the other is labelled as
conforming to half a dozen standards), but it seems to be more in-line
with the behavior git expects.

So I agree this patch is useful for reaching strict conformance to the
published specs, but I think we can go back to assuming no known
platforms are broken and unusable because of this.

Thanks.

David

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox