Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 3/6] Glean libexec path from argv[0] for git-upload-pack and git-receive-pack.
From: Steffen Prohaska @ 2009-01-11 10:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Johannes Sixt,
	Steve Haslam
In-Reply-To: <7viqomx5iq.fsf@gitster.siamese.dyndns.org>


On Jan 10, 2009, at 9:36 PM, Junio C Hamano wrote:

> Steffen Prohaska <prohaska@zib.de> writes:
>
>> On Jan 10, 2009, at 3:34 PM, Johannes Schindelin wrote:
>>
>>> Logically, and to avoid committing a broken revision, 1/6 should  
>>> come
>>> last, methinks.
>>
>> RUNTIME_PREFIX is not defined before 6/6. But I agree,
>> 1/6 should probably be moved after 5/6.
>>
> Hmm, I actually was thinking about applying that (and that one only)  
> early
> to my tree, to make sure it is regression-free.

Does "early" mean that you want to wait and see how the discussion
about the other patches evolves before you consider applying them,
or does "and that one only" mean that you tend to not apply the
other patches at all?

	Steffen

^ permalink raw reply

* Re: [PATCH 3/6] Glean libexec path from argv[0] for git-upload-pack and git-receive-pack.
From: Junio C Hamano @ 2009-01-11 10:21 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Johannes Schindelin, Git Mailing List, Johannes Sixt,
	Steve Haslam
In-Reply-To: <E976B246-AD14-4B03-B204-F6A1014071DF@zib.de>

Steffen Prohaska <prohaska@zib.de> writes:

>> Hmm, I actually was thinking about applying that (and that one only)
>> early
>> to my tree, to make sure it is regression-free.
>
> Does "early" mean that you want to wait and see how the discussion
> about the other patches evolves before you consider applying them,
> or does "and that one only" mean that you tend to not apply the
> other patches at all?

My initial impression after reading 1/6 was that no matter how the actual
runtime prefix detection logic that is implemented in the later parts of
the series for particular platform will turn out to be, the update to the
Makefile that is done by 1/6 won't have to change.  If I apply 1/6 first
without applying anything else, we can make sure that it would not regress
for Unix people (and catch regressions early if any), while Windows people
polish the platform specific parts of the implementation in the later
parts of the series that can be replaced.

Because changes to Makefile variables tend to have unexpected side effects
(people have their own definition to override them in their build
procedures and you can easily break them unless you are careful), I wanted
to make sure the common part is solid before waiting for the other part.

But if you think it is better not to apply any one, until other parts
mature, it is Ok by me, too.

^ permalink raw reply

* [PATCH v3 0/4] customizable --color-words
From: Thomas Rast @ 2009-01-11 10:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Teemu Likonen
In-Reply-To: <7vr63atykr.fsf@gitster.siamese.dyndns.org>

Johannes Schindelin wrote:
> On Sat, 10 Jan 2009, Thomas Rast wrote:
> > Johannes Schindelin wrote:
> > > So I still find your patch way too large

The bad news is... it just got bigger ;-)

> In your case, I imagine it would be much easier to get reviewers if you 
> had
> 
> 	patch 1/4 refactor color-words to allow for 0-character word 
> 		boundaries
> 	patch 2/4 allow regular expressions to define what makes a word

So here's a 4-patch series.  I put the first split in a different
place than you suggested, however.  I couldn't see a good way to
separate empty boundaries from regex splitting in such a way that the
first half can be exercised (is not just dead code).  1/4 basically
just rearranges code a bit and should be a real no-op patch.

Junio C Hamano wrote:
>     diff.c: In function 'scan_word_boundaries':
>     diff.c:512: warning: enumeration value 'DIFF_WORD_UNDEF' not handled in sw

Thanks, added a case to test for this.

There is one other minor semantic change in 2/4: the error reporting
in case your regex matched "foo\nbar" now says "before 'bar'" instead
of "near '\nbar'".  Other than that, there are only a bunch of added
comments when comparing the result of all four patches with v2.

Thomas Rast (4):
  word diff: comments, preparations for regex customization
  word diff: customizable word splits
  word diff: make regex configurable via attributes
  word diff: test customizable word splits

 Documentation/diff-options.txt  |   18 +++-
 Documentation/gitattributes.txt |   21 +++
 diff.c                          |  282 ++++++++++++++++++++++++++++++++++++---
 diff.h                          |    1 +
 t/t4033-diff-color-words.sh     |   90 +++++++++++++
 userdiff.c                      |   27 +++-
 userdiff.h                      |    1 +
 7 files changed, 413 insertions(+), 27 deletions(-)
 create mode 100755 t/t4033-diff-color-words.sh

^ permalink raw reply

* [PATCH v3 1/4] word diff: comments, preparations for regex customization
From: Thomas Rast @ 2009-01-11 10:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Teemu Likonen
In-Reply-To: <7vr63atykr.fsf@gitster.siamese.dyndns.org>

This reorganizes the code for diff --color-words in a way that will be
convenient for the next patch, without changing any of the semantics.
The new variables are not used yet except for their default state.

We also add some comments on the workings of diff_words_show() and
associated helper routines.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 diff.c |   86 +++++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 69 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index d235482..f274bf5 100644
--- a/diff.c
+++ b/diff.c
@@ -316,11 +316,21 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
 	return 0;
 }
 
+/* unused right now */
+enum diff_word_boundaries {
+	DIFF_WORD_UNDEF,
+	DIFF_WORD_BODY,
+	DIFF_WORD_END,
+	DIFF_WORD_SPACE,
+	DIFF_WORD_SKIP
+};
+
 struct diff_words_buffer {
 	mmfile_t text;
 	long alloc;
 	long current; /* output pointer */
 	int suppressed_newline;
+	enum diff_word_boundaries *boundaries;
 };
 
 static void diff_words_append(char *line, unsigned long len,
@@ -339,16 +349,23 @@ static void diff_words_append(char *line, unsigned long len,
 struct diff_words_data {
 	struct diff_words_buffer minus, plus;
 	FILE *file;
+	regex_t *word_regex; /* currently unused */
 };
 
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
+/*
+ * Print 'len' characters from the "real" diff data in 'buffer'.  Also
+ * returns how many characters were printed (currently always 'len').
+ * With 'suppress_newline', we remember a final newline instead of
+ * printing it.
+ */
+static int print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
 		int suppress_newline)
 {
 	const char *ptr;
 	int eol = 0;
 
 	if (len == 0)
-		return;
+		return len;
 
 	ptr  = buffer->text.ptr + buffer->current;
 	buffer->current += len;
@@ -368,18 +385,30 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
 		else
 			putc('\n', file);
 	}
+
+	/* we need to return how many chars to skip on the other side,
+	 * so account for the (held off) \n */
+	return len+eol;
 }
 
+/*
+ * Callback for word diff output
+ */
 static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 {
 	struct diff_words_data *diff_words = priv;
 
 	if (diff_words->minus.suppressed_newline) {
+		/* We completely drop a suppressed newline on the
+		 * minus side, if it is immediately followed by a plus
+		 * side output. This formats a word change right
+		 * before the end of line correctly */
 		if (line[0] != '+')
 			putc('\n', diff_words->file);
 		diff_words->minus.suppressed_newline = 0;
 	}
 
+	/* account for the [+- ] inserted by the line diff */
 	len--;
 	switch (line[0]) {
 		case '-':
@@ -391,8 +420,10 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 				   &diff_words->plus, len, DIFF_FILE_NEW, 0);
 			break;
 		case ' ':
-			print_word(diff_words->file,
-				   &diff_words->plus, len, DIFF_PLAIN, 0);
+			len = print_word(diff_words->file,
+					 &diff_words->plus, len, DIFF_PLAIN, 0);
+			/* skip the characters that were printed on
+			 * the other side, too */
 			diff_words->minus.current += len;
 			break;
 	}
@@ -409,22 +440,37 @@ static void diff_words_show(struct diff_words_data *diff_words)
 
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
-	minus.size = diff_words->minus.text.size;
-	minus.ptr = xmalloc(minus.size);
-	memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
-	for (i = 0; i < minus.size; i++)
-		if (isspace(minus.ptr[i]))
-			minus.ptr[i] = '\n';
-	diff_words->minus.current = 0;
 
-	plus.size = diff_words->plus.text.size;
-	plus.ptr = xmalloc(plus.size);
-	memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
-	for (i = 0; i < plus.size; i++)
-		if (isspace(plus.ptr[i]))
-			plus.ptr[i] = '\n';
+	/* currently always true */
+	if (!diff_words->word_regex) {
+		/*
+		 * "Simple" word diff: replace all space characters
+		 * with a newline.
+		 *
+		 * This groups together "words" of nonspaces on a line
+		 * each, which we then diff using the normal line-diff
+		 * mechanism.  It also has the nice property that
+		 * character counts/offsets stay the same.
+		 */
+		minus.size = diff_words->minus.text.size;
+		minus.ptr = xmalloc(minus.size);
+		memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
+		for (i = 0; i < minus.size; i++)
+			if (isspace(minus.ptr[i]))
+				minus.ptr[i] = '\n';
+
+		plus.size = diff_words->plus.text.size;
+		plus.ptr = xmalloc(plus.size);
+		memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
+		for (i = 0; i < plus.size; i++)
+			if (isspace(plus.ptr[i]))
+				plus.ptr[i] = '\n';
+	}
+	diff_words->minus.current = 0;
 	diff_words->plus.current = 0;
 
+	/* we want a minimal diff with enough context to run
+	 * everything into a single hunk */
 	xpp.flags = XDF_NEED_MINIMAL;
 	xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
 	xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
@@ -432,7 +478,12 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	free(minus.ptr);
 	free(plus.ptr);
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
+	/* these two are currently free(NULL) */
+	free(diff_words->minus.boundaries);
+	free(diff_words->plus.boundaries);
 
+	/* do not forget about a possible final newline that was held
+	 * back */
 	if (diff_words->minus.suppressed_newline) {
 		putc('\n', diff_words->file);
 		diff_words->minus.suppressed_newline = 0;
@@ -461,6 +512,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
 
 		free (ecbdata->diff_words->minus.text.ptr);
 		free (ecbdata->diff_words->plus.text.ptr);
+		free(ecbdata->diff_words->word_regex);
 		free(ecbdata->diff_words);
 		ecbdata->diff_words = NULL;
 	}
-- 
1.6.1.269.g0769

^ permalink raw reply related

* [PATCH v3 2/4] word diff: customizable word splits
From: Thomas Rast @ 2009-01-11 10:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Teemu Likonen
In-Reply-To: <7vr63atykr.fsf@gitster.siamese.dyndns.org>

Allows for user-configurable word splits when using --color-words.
This can make the diff more readable if the regex is configured
according to the language of the file.

Each non-overlapping match of the regex is a word; everything in
between is whitespace.  We disallow matching the empty string (because
it results in an endless loop) or a newline (breaks color escapes and
interacts badly with the input coming from the usual line diff).  To
help the user, we set REG_NEWLINE so that [^...] and . do not match
newlines.

--color-words works (and always worked) by splitting words onto one
line each, and using the normal line-diff machinery to get a word
diff.  Since we cannot reuse the current approach of simply
overwriting uninteresting characters with '\n', we insert an
artificial '\n' at the end of each detected word.  Its presence must
be tracked so that we can distinguish artificial from source newlines.

Insertion of spaces is somewhat subtle.  We echo a "context" space
twice (once on each side of the diff) if it follows directly after a
word.  While this loses a tiny bit of accuracy, it runs together long
sequences of changed word into one removed and one added block, making
the diff much more readable.  This feature also means that the
splitting regex '\S+' results in the same output as the original code.
The existing code still stays in place in case no regex is provided,
for performance.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 Documentation/diff-options.txt |   16 +++-
 diff.c                         |  196 +++++++++++++++++++++++++++++++++++++++-
 diff.h                         |    1 +
 3 files changed, 206 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 671f533..6152d5b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,8 +91,20 @@ endif::git-format-patch[]
 	Turn off colored diff, even when the configuration file
 	gives the default to color output.
 
---color-words::
-	Show colored word diff, i.e. color words which have changed.
+--color-words[=<regex>]::
+	Show colored word diff, i.e., color words which have changed.
+	By default, a new word only starts at whitespace, so that a
+	'word' is defined as a maximal sequence of non-whitespace
+	characters.  The optional argument <regex> can be used to
+	configure this.
++
+The <regex> must be an (extended) regular expression.  When set, every
+non-overlapping match of the <regex> is considered a word.  (Regular
+expression semantics ensure that quantifiers grab a maximal sequence
+of characters.)  Anything between these matches is considered
+whitespace and ignored for the purposes of finding differences.  You
+may want to append `|\S` to your regular expression to make sure that
+it matches all non-whitespace characters.
 
 --no-renames::
 	Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index f274bf5..badaea6 100644
--- a/diff.c
+++ b/diff.c
@@ -316,7 +316,21 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
 	return 0;
 }
 
-/* unused right now */
+/*
+ * We use these to save the word boundaries:
+ *
+ * - BODY and END track the extent of a word; after END, an artificial
+ *   newline will be inserted.
+ *
+ * - SPACE means the character is a space, and will be replaced by a
+ *   newline.  If a space follows immediately after END, it is flagged
+ *   SKIP instead, so that two words with exactly one space in between
+ *   end up with only one newline between them.
+ *
+ * - The boundary array is overallocated by one, and the spare element
+ *   is flagged UNDEF to allow peeking over the end of a word to see
+ *   if the next element is a SKIP.
+ */
 enum diff_word_boundaries {
 	DIFF_WORD_UNDEF,
 	DIFF_WORD_BODY,
@@ -349,12 +363,12 @@ static void diff_words_append(char *line, unsigned long len,
 struct diff_words_data {
 	struct diff_words_buffer minus, plus;
 	FILE *file;
-	regex_t *word_regex; /* currently unused */
+	regex_t *word_regex;
 };
 
 /*
  * Print 'len' characters from the "real" diff data in 'buffer'.  Also
- * returns how many characters were printed (currently always 'len').
+ * returns how many characters were printed.
  * With 'suppress_newline', we remember a final newline instead of
  * printing it.
  */
@@ -368,8 +382,27 @@ static int print_word(FILE *file, struct diff_words_buffer *buffer, int len, int
 		return len;
 
 	ptr  = buffer->text.ptr + buffer->current;
+
+	if (buffer->boundaries
+	    && (buffer->boundaries[buffer->current] == DIFF_WORD_BODY
+		|| buffer->boundaries[buffer->current] == DIFF_WORD_END)) {
+		/* drop the artificial newline */
+		len--;
+		/* we still have len>0 because it is a word, and
+		 * scan_word_boundaries() disallows words of length 0. */
+	}
+
 	buffer->current += len;
 
+	/* Peek past the end (this is safe because we overallocated)
+	 * to check if the next character was a skipped space. If so,
+	 * we put it together with the word. */
+	if (buffer->boundaries
+	    && buffer->boundaries[buffer->current] == DIFF_WORD_SKIP) {
+		buffer->current++;
+		len++;
+	}
+
 	if (ptr[len - 1] == '\n') {
 		eol = 1;
 		len--;
@@ -429,6 +462,141 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	}
 }
 
+/*
+ * Fancy word splitting by regex
+ *
+ * We search for all non-overlapping matches of 'pattern' in the
+ * 'buf', and define every match as a (separate) word and all
+ * unmatched characters as whitespace.
+ *
+ * Then we build a new mmfile where each word is on a line of its own.
+ * Two of these can then be line-diffed to find the word differences
+ * between the original buffers.  Unlike the normal word diff (see
+ * diff_words_show() below), the transformation does not preserve
+ * character counts, so we need to keep tracking information in
+ * buf->boundaries for later use by print_word().
+ */
+static void scan_word_boundaries(regex_t *pattern, struct diff_words_buffer *buf,
+				 mmfile_t *mmfile)
+{
+	char *text = buf->text.ptr;
+	int len = buf->text.size;
+	int i = 0;
+	/* counts how many extra characters will be inserted into the
+	 * mmfile */
+	int count = 0;
+	int ret;
+	regmatch_t matches[1];
+	int offset, wordlen;
+	char *strz, *p;
+
+	/* overallocate by 1 so we can safely peek past the end for a
+	 * SKIP, see print_word() */
+	buf->boundaries = xmalloc((len+1) * sizeof(enum diff_word_boundaries));
+	buf->boundaries[len] = DIFF_WORD_UNDEF;
+
+	if (!text) {
+		mmfile->ptr = NULL;
+		mmfile->size = 0;
+		return;
+	}
+
+	/* we unfortunately need a null-terminated copy for regexec */
+	strz = xmalloc(len+1);
+	memcpy(strz, text, len);
+	strz[len] = '\0';
+
+	while (i < len) {
+		/* iteratively match the regex against the rest of the
+		 * input string to find the next word */
+		ret = regexec(pattern, strz+i, 1, matches, 0);
+		if (ret == REG_NOMATCH) {
+			/* The rest is whitespace.  The first space
+			 * character is flagged SKIP unless there was
+			 * no preceding text at all */
+			if (i > 0 && i < len) {
+				buf->boundaries[i++] = DIFF_WORD_SKIP;
+				count--; /* SKIP characters have no output */
+			}
+			while (i < len)
+				buf->boundaries[i++] = DIFF_WORD_SPACE;
+			break;
+		}
+
+		offset = matches[0].rm_so;
+
+		/* everything up to the next word is whitespace, using
+		 * the same SKIP rule as above */
+		if (offset > 0 && i > 0) {
+			buf->boundaries[i++] = DIFF_WORD_SKIP;
+			count--;
+			offset--;
+		}
+		while (offset-- > 0)
+			buf->boundaries[i++] = DIFF_WORD_SPACE;
+
+		/* rm_eo is the first character after the match, so
+		 * this is indeed the number of characters matched */
+		wordlen = matches[0].rm_eo - matches[0].rm_so;
+
+		/* all but the last character are BODY */
+		while (wordlen > 1) {
+			if (strz[i] == '\n')
+				die("word regex matched a newline before '%s'",
+				    strz+i+1);
+			buf->boundaries[i++] = DIFF_WORD_BODY;
+			wordlen--;
+		}
+		/* the last character is END, we will insert an extra
+		 * '\n' after it */
+		if (wordlen > 0) {
+			if (strz[i] == '\n')
+				die("word regex matched a newline before '%s'",
+				    strz+i+1);
+			buf->boundaries[i++] = DIFF_WORD_END;
+			count++;
+		} else {
+			/* this would cause an endless loop, so panic */
+			die("word regex matched the empty string at '%s'",
+			    strz+i);
+		}
+	}
+
+	free(strz);
+
+	/* now build the mmfile. there will be 'count' more characters
+	 * than in the original */
+	mmfile->size = len + count;
+	mmfile->ptr = xmalloc(mmfile->size);
+	p = mmfile->ptr;
+	for (i = 0; i < len; i++) {
+		switch (buf->boundaries[i]) {
+		case DIFF_WORD_BODY: /* copy over */
+			*p++ = text[i];
+			break;
+		case DIFF_WORD_END: /* copy and insert an artificial newline */
+			*p++ = text[i];
+			*p++ = '\n';
+			break;
+		case DIFF_WORD_SPACE: /* replace by '\n' */
+			*p++ = '\n';
+			break;
+		case DIFF_WORD_SKIP:
+			/* Ignore. Since the character right before
+			 * this one is always an END, another way to
+			 * look at it is that we avoid duplicate END
+			 * newlines that are already provided by
+			 * SPACE */
+			break;
+		case DIFF_WORD_UNDEF:
+			/* can't happen, but silences a warning */
+			die("can't happen, send test case to git@vger.kernel.org");
+			break;
+		}
+	}
+}
+
+
 /* this executes the word diff on the accumulated buffers */
 static void diff_words_show(struct diff_words_data *diff_words)
 {
@@ -441,7 +609,6 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
 
-	/* currently always true */
 	if (!diff_words->word_regex) {
 		/*
 		 * "Simple" word diff: replace all space characters
@@ -465,6 +632,13 @@ static void diff_words_show(struct diff_words_data *diff_words)
 		for (i = 0; i < plus.size; i++)
 			if (isspace(plus.ptr[i]))
 				plus.ptr[i] = '\n';
+	} else {
+		/* Configurable word diff with a regex.  See
+		 * scan_word_boundaries() above. */
+		scan_word_boundaries(diff_words->word_regex,
+				     &diff_words->minus, &minus);
+		scan_word_boundaries(diff_words->word_regex,
+				     &diff_words->plus, &plus);
 	}
 	diff_words->minus.current = 0;
 	diff_words->plus.current = 0;
@@ -478,7 +652,6 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	free(minus.ptr);
 	free(plus.ptr);
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
-	/* these two are currently free(NULL) */
 	free(diff_words->minus.boundaries);
 	free(diff_words->plus.boundaries);
 
@@ -1535,6 +1708,15 @@ static void builtin_diff(const char *name_a,
 			ecbdata.diff_words =
 				xcalloc(1, sizeof(struct diff_words_data));
 			ecbdata.diff_words->file = o->file;
+			if (o->word_regex) {
+				ecbdata.diff_words->word_regex = (regex_t *)
+					xmalloc(sizeof(regex_t));
+				if (regcomp(ecbdata.diff_words->word_regex,
+					    o->word_regex,
+					    REG_EXTENDED|REG_NEWLINE))
+					die ("Invalid regular expression: %s",
+					     o->word_regex);
+			}
 		}
 		xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 			      &xpp, &xecfg, &ecb);
@@ -2546,6 +2728,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_CLR(options, COLOR_DIFF);
 	else if (!strcmp(arg, "--color-words"))
 		options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+	else if (!prefixcmp(arg, "--color-words=")) {
+		options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+		options->word_regex = arg + 14;
+	}
 	else if (!strcmp(arg, "--exit-code"))
 		DIFF_OPT_SET(options, EXIT_WITH_STATUS);
 	else if (!strcmp(arg, "--quiet"))
diff --git a/diff.h b/diff.h
index 4d5a327..23cd90c 100644
--- a/diff.h
+++ b/diff.h
@@ -98,6 +98,7 @@ struct diff_options {
 
 	int stat_width;
 	int stat_name_width;
+	const char *word_regex;
 
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
-- 
1.6.1.269.g0769

^ permalink raw reply related

* [PATCH v3 3/4] word diff: make regex configurable via attributes
From: Thomas Rast @ 2009-01-11 10:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Teemu Likonen
In-Reply-To: <7vr63atykr.fsf@gitster.siamese.dyndns.org>

Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute.  The user can then set the
driver on a file in .gitattributes.  If a regex is given on the
command line, it overrides the driver's setting.

We also provide built-in regexes for some of the languages that
already had funcname patterns.  (They are designed to run UTF-8
sequences into a single chunk to make sure they remain readable.)

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 Documentation/diff-options.txt  |    4 +++-
 Documentation/gitattributes.txt |   21 +++++++++++++++++++++
 diff.c                          |   10 ++++++++++
 userdiff.c                      |   27 +++++++++++++++++++--------
 userdiff.h                      |    1 +
 5 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 6152d5b..d22c06b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -96,7 +96,9 @@ endif::git-format-patch[]
 	By default, a new word only starts at whitespace, so that a
 	'word' is defined as a maximal sequence of non-whitespace
 	characters.  The optional argument <regex> can be used to
-	configure this.
+	configure this.  It can also be set via a diff driver, see
+	linkgit:gitattributes[1]; if a <regex> is given explicitly, it
+	overrides any diff driver setting.
 +
 The <regex> must be an (extended) regular expression.  When set, every
 non-overlapping match of the <regex> is considered a word.  (Regular
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8af22ec..67f5522 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -334,6 +334,27 @@ patterns are available:
 - `tex` suitable for source code for LaTeX documents.
 
 
+Customizing word diff
+^^^^^^^^^^^^^^^^^^^^^
+
+You can customize the rules that `git diff --color-words` uses to
+split words in a line, by specifying an appropriate regular expression
+in the "diff.*.wordregex" configuration variable.  For example, in TeX
+a backslash followed by a sequence of letters forms a command, but
+several such commands can be run together without intervening
+whitespace.  To separate them, use a regular expression such as
+
+------------------------
+[diff "tex"]
+	wordregex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{} \t]+"
+------------------------
+
+Similar to 'xfuncname', a built in value is provided for the drivers
+`bibtex`, `html`, `java`, `php`, `python` and `tex`.  See the
+documentation of --color-words in linkgit:git-diff[1] for the precise
+semantics.
+
+
 Performing text diffs of binary files
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/diff.c b/diff.c
index badaea6..c1cc426 100644
--- a/diff.c
+++ b/diff.c
@@ -1548,6 +1548,12 @@ static const struct userdiff_funcname *diff_funcname_pattern(struct diff_filespe
 	return one->driver->funcname.pattern ? &one->driver->funcname : NULL;
 }
 
+static const char *userdiff_word_regex(struct diff_filespec *one)
+{
+	diff_filespec_load_driver(one);
+	return one->driver->word_regex;
+}
+
 void diff_set_mnemonic_prefix(struct diff_options *options, const char *a, const char *b)
 {
 	if (!options->a_prefix)
@@ -1708,6 +1714,10 @@ static void builtin_diff(const char *name_a,
 			ecbdata.diff_words =
 				xcalloc(1, sizeof(struct diff_words_data));
 			ecbdata.diff_words->file = o->file;
+			if (!o->word_regex)
+				o->word_regex = userdiff_word_regex(one);
+			if (!o->word_regex)
+				o->word_regex = userdiff_word_regex(two);
 			if (o->word_regex) {
 				ecbdata.diff_words->word_regex = (regex_t *)
 					xmalloc(sizeof(regex_t));
diff --git a/userdiff.c b/userdiff.c
index 3681062..7fd9a07 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -6,13 +6,17 @@ static struct userdiff_driver *drivers;
 static int ndrivers;
 static int drivers_alloc;
 
-#define FUNCNAME(name, pattern) \
+#define FUNCNAME(name, pattern)			\
 	{ name, NULL, -1, { pattern, REG_EXTENDED } }
+#define PATTERNS(name, pattern, wordregex)			\
+	{ name, NULL, -1, { pattern, REG_EXTENDED }, NULL, wordregex }
 static struct userdiff_driver builtin_drivers[] = {
-FUNCNAME("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$"),
-FUNCNAME("java",
+PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
+	 "[^<>= \t]+|\\S"),
+PATTERNS("java",
 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
-	 "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$"),
+	 "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$",
+	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[-+*/]=|\\+\\+|--|\\S|[\x80-\xff]+"),
 FUNCNAME("objc",
 	 /* Negate C statements that can look like functions */
 	 "!^[ \t]*(do|for|if|else|return|switch|while)\n"
@@ -27,14 +31,19 @@ FUNCNAME("pascal",
 		"implementation|initialization|finalization)[ \t]*.*)$"
 	 "\n"
 	 "^(.*=[ \t]*(class|record).*)$"),
-FUNCNAME("php", "^[\t ]*((function|class).*)"),
-FUNCNAME("python", "^[ \t]*((class|def)[ \t].*)$"),
+PATTERNS("php", "^[\t ]*((function|class).*)",
+	 "\\$?[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[-+*/]=|\\+\\+|--|->|\\S|[\x80-\xff]+"),
+PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
+	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[-+*/]=|//|\\S|[\x80-\xff]+"),
 FUNCNAME("ruby", "^[ \t]*((class|module|def)[ \t].*)$"),
-FUNCNAME("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$"),
-FUNCNAME("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"),
+PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
+	 "[={}\"]|[^={}\" \t]+"),
+PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
+	 "\\\\[a-zA-Z@]+|[{}]|\\\\.|[^\\{} \t]+"),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef FUNCNAME
+#undef PATTERNS
 
 static struct userdiff_driver driver_true = {
 	"diff=true",
@@ -134,6 +143,8 @@ int userdiff_config(const char *k, const char *v)
 		return parse_string(&drv->external, k, v);
 	if ((drv = parse_driver(k, v, "textconv")))
 		return parse_string(&drv->textconv, k, v);
+	if ((drv = parse_driver(k, v, "wordregex")))
+		return parse_string(&drv->word_regex, k, v);
 
 	return 0;
 }
diff --git a/userdiff.h b/userdiff.h
index ba29457..2aab13e 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -12,6 +12,7 @@ struct userdiff_driver {
 	int binary;
 	struct userdiff_funcname funcname;
 	const char *textconv;
+	const char *word_regex;
 };
 
 int userdiff_config(const char *k, const char *v);
-- 
1.6.1.269.g0769

^ permalink raw reply related

* [PATCH v3 4/4] word diff: test customizable word splits
From: Thomas Rast @ 2009-01-11 10:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Teemu Likonen
In-Reply-To: <7vr63atykr.fsf@gitster.siamese.dyndns.org>

Several tests for regex-configured word splits via command line and
gitattributes.  For good measure we also do a basic test of the
default --color-words since it was so far not covered at all.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/t4033-diff-color-words.sh |   90 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 90 insertions(+), 0 deletions(-)
 create mode 100755 t/t4033-diff-color-words.sh

diff --git a/t/t4033-diff-color-words.sh b/t/t4033-diff-color-words.sh
new file mode 100755
index 0000000..536cdac
--- /dev/null
+++ b/t/t4033-diff-color-words.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+
+
+test_description='diff --color-words'
+. ./test-lib.sh
+
+cat <<EOF > test_a
+foo_bar_baz
+a qu_ux b c
+alpha beta gamma delta
+EOF
+
+cat <<EOF > test_b
+foo_baz_baz
+a qu_new_ux b c
+alpha 4 2 delta
+EOF
+
+# t4026-diff-color.sh tests the color escapes, so we assume they do
+# not change
+
+munge () {
+    tail -n +5 | tr '\033' '!'
+}
+
+cat <<EOF > expect-plain
+![36m@@ -1,3 +1,3 @@![m
+![31mfoo_bar_baz![m![32mfoo_baz_baz![m
+a ![m![31mqu_ux ![m![32mqu_new_ux ![mb ![mc![m
+alpha ![m![31mbeta ![m![31mgamma ![m![32m4 ![m![32m2 ![mdelta![m
+EOF
+
+test_expect_success 'default settings' '
+	git diff --no-index --color-words test_a test_b |
+		munge > actual-plain &&
+	test_cmp expect-plain actual-plain
+'
+
+test_expect_success 'trivial regex yields same as default' '
+	git diff --no-index --color-words="\\S+" test_a test_b |
+		munge > actual-trivial &&
+	test_cmp expect-plain actual-trivial
+'
+
+cat <<EOF > expect-chars
+![36m@@ -1,3 +1,3 @@![m
+f![mo![mo![m_![mb![ma![m![31mr![m![32mz![m_![mb![ma![mz![m
+a ![mq![mu![m_![m![32mn![m![32me![m![32mw![m![32m_![mu![mx ![mb ![mc![m
+a![ml![mp![mh![ma ![m![31mb![m![31me![m![31mt![m![31ma ![m![31mg![m![31ma![m![31mm![m![31mm![m![31ma ![m![32m4 ![m![32m2 ![md![me![ml![mt![ma![m
+EOF
+
+test_expect_success 'character by character regex' '
+	git diff --no-index --color-words="\\S" test_a test_b |
+		munge > actual-chars &&
+	test_cmp expect-chars actual-chars
+'
+
+cat <<EOF > expect-nontrivial
+![36m@@ -1,3 +1,3 @@![m
+foo![m_![m![31mbar![m![32mbaz![m_![mbaz![m
+a ![mqu![m_![m![32mnew![m![32m_![mux ![mb ![mc![m
+alpha ![m![31mbeta ![m![31mgamma ![m![32m4![m![32m ![m![32m2![m![32m ![mdelta![m
+EOF
+
+test_expect_success 'nontrivial regex' '
+	git diff --no-index --color-words="[a-z]+|_" test_a test_b |
+		munge > actual-nontrivial &&
+	test_cmp expect-nontrivial actual-nontrivial
+'
+
+test_expect_success 'set a diff driver' '
+	git config diff.testdriver.wordregex "\\S" &&
+	cat <<EOF > .gitattributes
+test_* diff=testdriver
+EOF
+'
+
+test_expect_success 'use default supplied by driver' '
+	git diff --no-index --color-words test_a test_b |
+		munge > actual-chars-2 &&
+	test_cmp expect-chars actual-chars-2
+'
+
+test_expect_success 'option overrides default' '
+	git diff --no-index --color-words="[a-z]+|_" test_a test_b |
+		munge > actual-nontrivial-2 &&
+	test_cmp expect-nontrivial actual-nontrivial-2
+'
+
+test_done
-- 
1.6.1.269.g0769

^ permalink raw reply related

* Re: [PATCH] filter-branch: add git_commit_non_empty_tree and --prune-empty.
From: Pierre Habouzit @ 2009-01-11 11:18 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Junio C Hamano, git, pasky, srabbelier
In-Reply-To: <76718490901091129q534ca981iac54e0653d76170d@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

On Fri, Jan 09, 2009 at 07:29:15PM +0000, Jay Soffian wrote:
> On Mon, Nov 3, 2008 at 10:18 AM, Pierre Habouzit <madcoder@debian.org> wrote:
> > On Mon, Nov 03, 2008 at 09:27:29AM +0000, Pierre Habouzit wrote:
> >> On Mon, Nov 03, 2008 at 04:58:44AM +0000, Junio C Hamano wrote:
> 
> Bump, http://thread.gmane.org/gmane.comp.version-control.git/99440/
> 
> (I'd like to see this included. Having a bunch of empty commits after
> using filter-branch to remove unwanted files from history is, er,
> sub-optimal, so seems like it might even be default behavior?)

Yeah I have that in my own git tree, and I meant to ask if something had
to be fixed for it to be accepted for some time, but always forget about
it.

Junio, do you think this could be accepted, or does it need some work ?

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Funny: git -p submodule summary
From: Jeff King @ 2009-01-11 11:22 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <496728B9.7090200@viscovery.net>

On Fri, Jan 09, 2009 at 11:36:41AM +0100, Johannes Sixt wrote:

> I'll test your other patch (that replaces the execvp in git.c by
> run_command).

There is something funny with it that I have not diagnosed: aliases are
broken, and "git foobar" does not return an error. Presumably just
checking the "we did not exec succesfully" case is not triggering
properly.  However, I think the right solution is actually to refactor
git.c to figure out ahead of time whether we have a builtin, external,
or alias. I can work on that, but not tonight, as my git-time is up for
now.

But other than that, did it work for you on Windows?

However, here is a 4-patch series that handles the separate signal
delivery problem. It should fix the "^C makes funny things happen"
problems you were seeing. Please test and let me know how it works on
Windows.

The patches are:
  1/4: Makefile: clean up TEST_PROGRAMS definition
  2/4: chain kill signals for cleanup functions
  3/4: refactor signal handling for cleanup functions
  4/4: pager: do wait_for_pager on signal death

-Peff

^ permalink raw reply

* [PATCH 1/4] Makefile: clean up TEST_PROGRAMS definition
From: Jeff King @ 2009-01-11 11:25 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <20090111112222.GA29656@coredump.intra.peff.net>

We try to keep lines under 80 characters, not to mention
that sticking a bunch of stuff on one line makes diffs
messier.

Signed-off-by: Jeff King <peff@peff.net>
---
Just a cleanup I noticed while working on 2/4.

 Makefile |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index dee97c1..2b873fa 100644
--- a/Makefile
+++ b/Makefile
@@ -1356,7 +1356,14 @@ endif
 
 ### Testing rules
 
-TEST_PROGRAMS = test-chmtime$X test-genrandom$X test-date$X test-delta$X test-sha1$X test-match-trees$X test-parse-options$X test-path-utils$X
+TEST_PROGRAMS += test-chmtime$X
+TEST_PROGRAMS += test-date$X
+TEST_PROGRAMS += test-delta$X
+TEST_PROGRAMS += test-genrandom$X
+TEST_PROGRAMS += test-match-trees$X
+TEST_PROGRAMS += test-parse-options$X
+TEST_PROGRAMS += test-path-utils$X
+TEST_PROGRAMS += test-sha1$X
 
 all:: $(TEST_PROGRAMS)
 
-- 
1.6.1.84.g8150

^ permalink raw reply related

* [PATCH 2/4] chain kill signals for cleanup functions
From: Jeff King @ 2009-01-11 11:32 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <20090111112222.GA29656@coredump.intra.peff.net>

If a piece of code wanted to do some cleanup before exiting
(e.g., cleaing up a lockfile or a tempfile), our usual
strategy was to install a signal handler that did something
like this:

  do_cleanup(); /* actual work */
  signal(signo, SIG_DFL); /* restore previous behavior */
  raise(signo); /* deliver signal, killing ourselves */

For a single handler, this works fine. However, if we want
to clean up two _different_ things, we run into a problem.
The most recently installed handler will run, but when it
removes itself as a handler, it doesn't put back the first
handler.

This patch introduces sigchain, a tiny library for handling
a stack of signal handlers. You sigchain_push each handler,
and use sigchain_pop to restore whoever was before you in
the stack.

Signed-off-by: Jeff King <peff@peff.net>
---
I don't know if it is possible to trigger this issue with current git or
not. There are several instances that catch signals in different ways,
but I don't know if there are code paths that actually combine them. So
maybe it is fixing a bug, but it is definitely needed for 4/4, which
will install the wait_for_pager handler in most git runs.

I tried to neither over- nor under-engineer. On one hand, you could
actually accomplish the same thing by always keeping a "this was the
last handler" pointer at each callsite when installing the new handler.
But it's easy to mess up, so I think a little library helps keep the
code clean.

On the other hand, callsites still have to have separate clean_foo() and
clean_foo_on_signal() handlers, and manually pop() and raise() in the
latter. If the code just assumed you always wanted to chain to the next
handler immediately, then we could cut out even more client code.

The test script is probably overkill, but I wrote it to convince myself
I hadn't screwed up too badly (and because we probably wouldn't even
notice, as signals aren't well tested in our scripts). So I figured I
might as well share it.

 .gitignore            |    1 +
 Makefile              |    3 +++
 builtin-clone.c       |    5 +++--
 builtin-fetch--tool.c |    5 +++--
 builtin-fetch.c       |    5 +++--
 diff.c                |    5 +++--
 http-push.c           |   11 ++++++-----
 lockfile.c            |   13 +++++++------
 sigchain.c            |   43 +++++++++++++++++++++++++++++++++++++++++++
 sigchain.h            |    9 +++++++++
 t/t0005-signals.sh    |   18 ++++++++++++++++++
 test-sigchain.c       |   22 ++++++++++++++++++++++
 12 files changed, 121 insertions(+), 19 deletions(-)
 create mode 100644 sigchain.c
 create mode 100644 sigchain.h
 create mode 100755 t/t0005-signals.sh
 create mode 100644 test-sigchain.c

diff --git a/.gitignore b/.gitignore
index d9adce5..f28a54d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -152,6 +152,7 @@ test-match-trees
 test-parse-options
 test-path-utils
 test-sha1
+test-sigchain
 common-cmds.h
 *.tar.gz
 *.dsc
diff --git a/Makefile b/Makefile
index 2b873fa..fd02dec 100644
--- a/Makefile
+++ b/Makefile
@@ -388,6 +388,7 @@ LIB_H += revision.h
 LIB_H += run-command.h
 LIB_H += sha1-lookup.h
 LIB_H += sideband.h
+LIB_H += sigchain.h
 LIB_H += strbuf.h
 LIB_H += tag.h
 LIB_H += transport.h
@@ -481,6 +482,7 @@ LIB_OBJS += sha1-lookup.o
 LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
+LIB_OBJS += sigchain.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
@@ -1364,6 +1366,7 @@ TEST_PROGRAMS += test-match-trees$X
 TEST_PROGRAMS += test-parse-options$X
 TEST_PROGRAMS += test-path-utils$X
 TEST_PROGRAMS += test-sha1$X
+TEST_PROGRAMS += test-sigchain$X
 
 all:: $(TEST_PROGRAMS)
 
diff --git a/builtin-clone.c b/builtin-clone.c
index f1a1a0c..18b9392 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -19,6 +19,7 @@
 #include "strbuf.h"
 #include "dir.h"
 #include "pack-refs.h"
+#include "sigchain.h"
 
 /*
  * Overall FIXMEs:
@@ -288,7 +289,7 @@ static void remove_junk(void)
 static void remove_junk_on_signal(int signo)
 {
 	remove_junk();
-	signal(SIGINT, SIG_DFL);
+	sigchain_pop(signo);
 	raise(signo);
 }
 
@@ -438,7 +439,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 	junk_git_dir = git_dir;
 	atexit(remove_junk);
-	signal(SIGINT, remove_junk_on_signal);
+	sigchain_push(SIGINT, remove_junk_on_signal);
 
 	setenv(CONFIG_ENVIRONMENT, xstrdup(mkpath("%s/config", git_dir)), 1);
 
diff --git a/builtin-fetch--tool.c b/builtin-fetch--tool.c
index 469b07e..b1d7f8f 100644
--- a/builtin-fetch--tool.c
+++ b/builtin-fetch--tool.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "refs.h"
 #include "commit.h"
+#include "sigchain.h"
 
 static char *get_stdin(void)
 {
@@ -186,7 +187,7 @@ static void remove_keep(void)
 static void remove_keep_on_signal(int signo)
 {
 	remove_keep();
-	signal(SIGINT, SIG_DFL);
+	sigchain_pop(signo);
 	raise(signo);
 }
 
@@ -245,7 +246,7 @@ static int fetch_native_store(FILE *fp,
 	char buffer[1024];
 	int err = 0;
 
-	signal(SIGINT, remove_keep_on_signal);
+	sigchain_push(SIGINT, remove_keep_on_signal);
 	atexit(remove_keep);
 
 	while (fgets(buffer, sizeof(buffer), stdin)) {
diff --git a/builtin-fetch.c b/builtin-fetch.c
index de6f307..8c86974 100644
--- a/builtin-fetch.c
+++ b/builtin-fetch.c
@@ -10,6 +10,7 @@
 #include "transport.h"
 #include "run-command.h"
 #include "parse-options.h"
+#include "sigchain.h"
 
 static const char * const builtin_fetch_usage[] = {
 	"git fetch [options] [<repository> <refspec>...]",
@@ -58,7 +59,7 @@ static void unlock_pack(void)
 static void unlock_pack_on_signal(int signo)
 {
 	unlock_pack();
-	signal(SIGINT, SIG_DFL);
+	sigchain_pop(signo);
 	raise(signo);
 }
 
@@ -672,7 +673,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		ref_nr = j;
 	}
 
-	signal(SIGINT, unlock_pack_on_signal);
+	sigchain_push(SIGINT, unlock_pack_on_signal);
 	atexit(unlock_pack);
 	exit_code = do_fetch(transport,
 			parse_fetch_refspec(ref_nr, refs), ref_nr);
diff --git a/diff.c b/diff.c
index d235482..5a74012 100644
--- a/diff.c
+++ b/diff.c
@@ -12,6 +12,7 @@
 #include "run-command.h"
 #include "utf8.h"
 #include "userdiff.h"
+#include "sigchain.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -1933,7 +1934,7 @@ static void remove_tempfile(void)
 static void remove_tempfile_on_signal(int signo)
 {
 	remove_tempfile();
-	signal(SIGINT, SIG_DFL);
+	sigchain_pop(signo);
 	raise(signo);
 }
 
@@ -1968,7 +1969,7 @@ static void run_external_diff(const char *pgm,
 			atexit_asked = 1;
 			atexit(remove_tempfile);
 		}
-		signal(SIGINT, remove_tempfile_on_signal);
+		sigchain_push(SIGINT, remove_tempfile_on_signal);
 	}
 
 	if (one && two) {
diff --git a/http-push.c b/http-push.c
index a4b7d08..dec395d 100644
--- a/http-push.c
+++ b/http-push.c
@@ -10,6 +10,7 @@
 #include "exec_cmd.h"
 #include "remote.h"
 #include "list-objects.h"
+#include "sigchain.h"
 
 #include <expat.h>
 
@@ -1363,7 +1364,7 @@ static void remove_locks(void)
 static void remove_locks_on_signal(int signo)
 {
 	remove_locks();
-	signal(signo, SIG_DFL);
+	sigchain_pop(signo);
 	raise(signo);
 }
 
@@ -2261,10 +2262,10 @@ int main(int argc, char **argv)
 		goto cleanup;
 	}
 
-	signal(SIGINT, remove_locks_on_signal);
-	signal(SIGHUP, remove_locks_on_signal);
-	signal(SIGQUIT, remove_locks_on_signal);
-	signal(SIGTERM, remove_locks_on_signal);
+	sigchain_push(SIGINT, remove_locks_on_signal);
+	sigchain_push(SIGHUP, remove_locks_on_signal);
+	sigchain_push(SIGQUIT, remove_locks_on_signal);
+	sigchain_push(SIGTERM, remove_locks_on_signal);
 
 	/* Check whether the remote has server info files */
 	remote->can_update_info_refs = 0;
diff --git a/lockfile.c b/lockfile.c
index 8589155..3cd57dc 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005, Junio C Hamano
  */
 #include "cache.h"
+#include "sigchain.h"
 
 static struct lock_file *lock_file_list;
 static const char *alternate_index_output;
@@ -24,7 +25,7 @@ static void remove_lock_file(void)
 static void remove_lock_file_on_signal(int signo)
 {
 	remove_lock_file();
-	signal(signo, SIG_DFL);
+	sigchain_pop(signo);
 	raise(signo);
 }
 
@@ -136,11 +137,11 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
 	lk->fd = open(lk->filename, O_RDWR | O_CREAT | O_EXCL, 0666);
 	if (0 <= lk->fd) {
 		if (!lock_file_list) {
-			signal(SIGINT, remove_lock_file_on_signal);
-			signal(SIGHUP, remove_lock_file_on_signal);
-			signal(SIGTERM, remove_lock_file_on_signal);
-			signal(SIGQUIT, remove_lock_file_on_signal);
-			signal(SIGPIPE, remove_lock_file_on_signal);
+			sigchain_push(SIGINT, remove_lock_file_on_signal);
+			sigchain_push(SIGHUP, remove_lock_file_on_signal);
+			sigchain_push(SIGTERM, remove_lock_file_on_signal);
+			sigchain_push(SIGQUIT, remove_lock_file_on_signal);
+			sigchain_push(SIGPIPE, remove_lock_file_on_signal);
 			atexit(remove_lock_file);
 		}
 		lk->owner = getpid();
diff --git a/sigchain.c b/sigchain.c
new file mode 100644
index 0000000..a18d505
--- /dev/null
+++ b/sigchain.c
@@ -0,0 +1,43 @@
+#include "sigchain.h"
+#include "cache.h"
+
+#define SIGCHAIN_MAX_SIGNALS 32
+
+struct sigchain_signal {
+	sigchain_fun *old;
+	int n;
+	int alloc;
+};
+static struct sigchain_signal signals[SIGCHAIN_MAX_SIGNALS];
+
+static void check_signum(int sig)
+{
+	if (sig < 1 || sig >= SIGCHAIN_MAX_SIGNALS)
+		die("BUG: signal out of range: %d", sig);
+}
+
+int sigchain_push(int sig, sigchain_fun f)
+{
+	struct sigchain_signal *s = signals + sig;
+	check_signum(sig);
+
+	ALLOC_GROW(s->old, s->n + 1, s->alloc);
+	s->old[s->n] = signal(sig, f);
+	if (s->old[s->n] == SIG_ERR)
+		return -1;
+	s->n++;
+	return 0;
+}
+
+int sigchain_pop(int sig)
+{
+	struct sigchain_signal *s = signals + sig;
+	check_signum(sig);
+	if (s->n < 1)
+		return 0;
+
+	if (signal(sig, s->old[s->n - 1]) == SIG_ERR)
+		return -1;
+	s->n--;
+	return 0;
+}
diff --git a/sigchain.h b/sigchain.h
new file mode 100644
index 0000000..254ebb0
--- /dev/null
+++ b/sigchain.h
@@ -0,0 +1,9 @@
+#ifndef SIGCHAIN_H
+#define SIGCHAIN_H
+
+typedef void (*sigchain_fun)(int);
+
+int sigchain_push(int sig, sigchain_fun f);
+int sigchain_pop(int sig);
+
+#endif /* SIGCHAIN_H */
diff --git a/t/t0005-signals.sh b/t/t0005-signals.sh
new file mode 100755
index 0000000..d900df4
--- /dev/null
+++ b/t/t0005-signals.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+
+test_description='signals work as we expect'
+. ./test-lib.sh
+
+cat >expect <<EOF
+three
+two
+one
+EOF
+
+test_expect_success 'sigchain works' '
+  test-sigchain >actual
+  test "$?" = 130 &&
+  test_cmp expect actual
+'
+
+test_done
diff --git a/test-sigchain.c b/test-sigchain.c
new file mode 100644
index 0000000..8747dea
--- /dev/null
+++ b/test-sigchain.c
@@ -0,0 +1,22 @@
+#include "sigchain.h"
+#include "cache.h"
+
+#define X(f) \
+static void f(int sig) { \
+	puts(#f); \
+	fflush(stdout); \
+	sigchain_pop(sig); \
+	raise(sig); \
+}
+X(one)
+X(two)
+X(three)
+#undef X
+
+int main(int argc, char **argv) {
+	sigchain_push(SIGINT, one);
+	sigchain_push(SIGINT, two);
+	sigchain_push(SIGINT, three);
+	raise(SIGINT);
+	return 0;
+}
-- 
1.6.1.84.g8150

^ permalink raw reply related

* [PATCH 3/4] refactor signal handling for cleanup functions
From: Jeff King @ 2009-01-11 11:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <20090111112222.GA29656@coredump.intra.peff.net>

The current code is very inconsistent about which signals
are caught for doing cleanup of temporary files and lock
files. Some callsites checked only SIGINT, while others
checked a variety of death-dealing signals.

This patch factors out those signals to a single function,
and then calls it everywhere. For some sites, that means
this is a simple clean up. For others, it is an improvement
in that they will now properly clean themselves up after a
larger variety of signals.

Signed-off-by: Jeff King <peff@peff.net>
---
I'm assuming there was no good reason _not_ to be handling those other
signals at some of the "just handle SIGINT" sites. A sigchain
implementation which handled "remove_lock_file" without needing
"remove_lock_file_on_signal" could also call atexit(), too. So you could
have a one liner

  register_cleanup(remove_lock_file);

However, there is one case that doesn't use an atexit() handler:
http-push.c. I don't know if this is a bug or an intentional omission.

 builtin-clone.c       |    2 +-
 builtin-fetch--tool.c |    2 +-
 builtin-fetch.c       |    2 +-
 diff.c                |    2 +-
 http-push.c           |    5 +----
 lockfile.c            |    6 +-----
 sigchain.c            |    9 +++++++++
 sigchain.h            |    2 ++
 8 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/builtin-clone.c b/builtin-clone.c
index 18b9392..44c8073 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -439,7 +439,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 	junk_git_dir = git_dir;
 	atexit(remove_junk);
-	sigchain_push(SIGINT, remove_junk_on_signal);
+	sigchain_push_common(remove_junk_on_signal);
 
 	setenv(CONFIG_ENVIRONMENT, xstrdup(mkpath("%s/config", git_dir)), 1);
 
diff --git a/builtin-fetch--tool.c b/builtin-fetch--tool.c
index b1d7f8f..29356d2 100644
--- a/builtin-fetch--tool.c
+++ b/builtin-fetch--tool.c
@@ -246,7 +246,7 @@ static int fetch_native_store(FILE *fp,
 	char buffer[1024];
 	int err = 0;
 
-	sigchain_push(SIGINT, remove_keep_on_signal);
+	sigchain_push_common(remove_keep_on_signal);
 	atexit(remove_keep);
 
 	while (fgets(buffer, sizeof(buffer), stdin)) {
diff --git a/builtin-fetch.c b/builtin-fetch.c
index 8c86974..1e4a3d9 100644
--- a/builtin-fetch.c
+++ b/builtin-fetch.c
@@ -673,7 +673,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		ref_nr = j;
 	}
 
-	sigchain_push(SIGINT, unlock_pack_on_signal);
+	sigchain_push_common(unlock_pack_on_signal);
 	atexit(unlock_pack);
 	exit_code = do_fetch(transport,
 			parse_fetch_refspec(ref_nr, refs), ref_nr);
diff --git a/diff.c b/diff.c
index 5a74012..7fc8512 100644
--- a/diff.c
+++ b/diff.c
@@ -1969,7 +1969,7 @@ static void run_external_diff(const char *pgm,
 			atexit_asked = 1;
 			atexit(remove_tempfile);
 		}
-		sigchain_push(SIGINT, remove_tempfile_on_signal);
+		sigchain_push_common(remove_tempfile_on_signal);
 	}
 
 	if (one && two) {
diff --git a/http-push.c b/http-push.c
index dec395d..7d5c23e 100644
--- a/http-push.c
+++ b/http-push.c
@@ -2262,10 +2262,7 @@ int main(int argc, char **argv)
 		goto cleanup;
 	}
 
-	sigchain_push(SIGINT, remove_locks_on_signal);
-	sigchain_push(SIGHUP, remove_locks_on_signal);
-	sigchain_push(SIGQUIT, remove_locks_on_signal);
-	sigchain_push(SIGTERM, remove_locks_on_signal);
+	sigchain_push_common(remove_locks_on_signal);
 
 	/* Check whether the remote has server info files */
 	remote->can_update_info_refs = 0;
diff --git a/lockfile.c b/lockfile.c
index 3cd57dc..021c337 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -137,11 +137,7 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
 	lk->fd = open(lk->filename, O_RDWR | O_CREAT | O_EXCL, 0666);
 	if (0 <= lk->fd) {
 		if (!lock_file_list) {
-			sigchain_push(SIGINT, remove_lock_file_on_signal);
-			sigchain_push(SIGHUP, remove_lock_file_on_signal);
-			sigchain_push(SIGTERM, remove_lock_file_on_signal);
-			sigchain_push(SIGQUIT, remove_lock_file_on_signal);
-			sigchain_push(SIGPIPE, remove_lock_file_on_signal);
+			sigchain_push_common(remove_lock_file_on_signal);
 			atexit(remove_lock_file);
 		}
 		lk->owner = getpid();
diff --git a/sigchain.c b/sigchain.c
index a18d505..1118b99 100644
--- a/sigchain.c
+++ b/sigchain.c
@@ -41,3 +41,12 @@ int sigchain_pop(int sig)
 	s->n--;
 	return 0;
 }
+
+void sigchain_push_common(sigchain_fun f)
+{
+	sigchain_push(SIGINT, f);
+	sigchain_push(SIGHUP, f);
+	sigchain_push(SIGTERM, f);
+	sigchain_push(SIGQUIT, f);
+	sigchain_push(SIGPIPE, f);
+}
diff --git a/sigchain.h b/sigchain.h
index 254ebb0..618083b 100644
--- a/sigchain.h
+++ b/sigchain.h
@@ -6,4 +6,6 @@ typedef void (*sigchain_fun)(int);
 int sigchain_push(int sig, sigchain_fun f);
 int sigchain_pop(int sig);
 
+void sigchain_push_common(sigchain_fun f);
+
 #endif /* SIGCHAIN_H */
-- 
1.6.1.84.g8150

^ permalink raw reply related

* [PATCH 4/4] pager: do wait_for_pager on signal death
From: Jeff King @ 2009-01-11 11:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <20090111112222.GA29656@coredump.intra.peff.net>

Since ea27a18 (spawn pager via run_command interface), the
original git process actually does git work, and the pager
is a child process (actually, on Windows it has always been
that way, since Windows lacks fork). After spawning the
pager, we register an atexit() handler that waits for the
pager to finish.

Unfortunately, that handler does not always run. In
particular, if git is killed by a signal, then we exit
immediately. The calling shell then thinks that git is done;
however, the pager is still trying to run and impact the
terminal. The result can be seen by running a long git
process with a pager (e.g., "git log -p") and hitting ^C.
Depending on your config, you should see the shell prompt,
but pressing a key causes the pager to do any terminal
de-initialization sequence.

This patch just intercepts any death-dealing signals and
waits for the pager before dying. Under typical less
configuration, that means hitting ^C will cause git to stop
generating output, but the pager will keep running.

Signed-off-by: Jeff King <peff@peff.net>
---
 pager.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/pager.c b/pager.c
index f19ddbc..4921843 100644
--- a/pager.c
+++ b/pager.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "run-command.h"
+#include "sigchain.h"

 /*
  * This is split up from the rest of git so that we can do
@@ -38,6 +39,13 @@ static void wait_for_pager(void)
 	finish_command(&pager_process);
 }

+static void wait_for_pager_signal(int signo)
+{
+	wait_for_pager();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
 void setup_pager(void)
 {
 	const char *pager = getenv("GIT_PAGER");
@@ -75,6 +83,7 @@ void setup_pager(void)
 	close(pager_process.in);

 	/* this makes sure that the parent terminates after the pager */
+	sigchain_push_common(wait_for_pager_signal);
 	atexit(wait_for_pager);
 }

-- 
1.6.1.84.g8150

^ permalink raw reply related

* Re: [PATCH 2/4] chain kill signals for cleanup functions
From: Jeff King @ 2009-01-11 11:40 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <20090111113211.GB29791@coredump.intra.peff.net>

On Sun, Jan 11, 2009 at 06:32:12AM -0500, Jeff King wrote:

> @@ -1968,7 +1969,7 @@ static void run_external_diff(const char *pgm,
>  			atexit_asked = 1;
>  			atexit(remove_tempfile);
>  		}
> -		signal(SIGINT, remove_tempfile_on_signal);
> +		sigchain_push(SIGINT, remove_tempfile_on_signal);

Hmm.

Note that because we are now pushing instead of just replacing the
signal handler, it might matter if it gets called multiple times (though
I think most of the cleanup functions are relatively harmless if run
multiple times). Most of the callsites protect against installing the
signal handler twice, but I think this one should probably be moved up
inside the atexit_asked condition:

  if (! atexit_asked &&
      (temp[0].name == temp[0].tmp_path ||
       temp[1].name == temp[1].tmp_path)) {
          atexit_asked = 1;
          atexit(remove_tempfile);
  }
  sigchain_push_common(remove_tempfile_on_signal);

-Peff

^ permalink raw reply

* [PATCH] Allow cloning to an existing empty directory
From: Alexander Potashev @ 2009-01-11 12:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Alexander Potashev
In-Reply-To: <1231589270-15812-1-git-send-email-aspotashev@gmail.com>

The die() message updated accordingly.

The previous behaviour was to only allow cloning when the destination
directory doesn't exist.

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
---
 builtin-clone.c |    8 +++++---
 dir.c           |   19 +++++++++++++++++++
 dir.h           |    2 ++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/builtin-clone.c b/builtin-clone.c
index f1a1a0c..e732f15 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -357,6 +357,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct stat buf;
 	const char *repo_name, *repo, *work_tree, *git_dir;
 	char *path, *dir;
+	int dest_exists;
 	const struct ref *refs, *head_points_at, *remote_head, *mapped_refs;
 	struct strbuf key = STRBUF_INIT, value = STRBUF_INIT;
 	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
@@ -406,8 +407,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		dir = guess_dir_name(repo_name, is_bundle, option_bare);
 	strip_trailing_slashes(dir);
 
-	if (!stat(dir, &buf))
-		die("destination directory '%s' already exists.", dir);
+	if ((dest_exists = !stat(dir, &buf)) && !is_empty_dir(dir))
+		die("destination path '%s' already exists and is not "
+			"an empty directory.", dir);
 
 	strbuf_addf(&reflog_msg, "clone: from %s", repo);
 
@@ -431,7 +433,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		if (safe_create_leading_directories_const(work_tree) < 0)
 			die("could not create leading directories of '%s': %s",
 					work_tree, strerror(errno));
-		if (mkdir(work_tree, 0755))
+		if (!dest_exists && mkdir(work_tree, 0755))
 			die("could not create work tree dir '%s': %s.",
 					work_tree, strerror(errno));
 		set_git_work_tree(work_tree);
diff --git a/dir.c b/dir.c
index 3347f46..7c59829 100644
--- a/dir.c
+++ b/dir.c
@@ -777,6 +777,25 @@ int is_inside_dir(const char *dir)
 	return get_relative_cwd(buffer, sizeof(buffer), dir) != NULL;
 }
 
+int is_empty_dir(const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int ret = 1;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name)) {
+			ret = 0;
+			break;
+		}
+
+	closedir(dir);
+	return ret;
+}
+
 int remove_dir_recursively(struct strbuf *path, int only_empty)
 {
 	DIR *dir = opendir(path->buf);
diff --git a/dir.h b/dir.h
index e1640a8..c950c83 100644
--- a/dir.h
+++ b/dir.h
@@ -83,6 +83,8 @@ static inline int is_dot_or_dotdot(const char *name)
 		(name[1] == '.' && name[2] == '\0')); /* "." and ".." */
 }
 
+extern int is_empty_dir(const char *dir);
+
 extern void setup_standard_excludes(struct dir_struct *dir);
 extern int remove_dir_recursively(struct strbuf *path, int only_empty);
 
-- 
1.6.1

^ permalink raw reply related

* Re: What's cooking in git.git (Jan 2009, #02; Sun, 11)
From: Alexander Potashev @ 2009-01-11 12:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v63kmtbk6.fsf@gitster.siamese.dyndns.org>

On 01:51 Sun 11 Jan     , Junio C Hamano wrote:
> [New Topics]
> 
> Need to clean up the log message, perhaps rebase it to maint-1.6.0 and
> start cooking in 'next'.
> 
> * jc/maint-format-patch (Sat Jan 10 12:41:33 2009 -0800) 1 commit
>  + format-patch: show patch text for the root commit

My testcases ([PATCH] Add new testcases for format-patch root commits)
for this don't satisfy the target behaviour.

> 
> All of the above 'pu' topics are ready for 'next'.
> 
> * ap/clone-into-empty (Fri Jan 9 02:24:23 2009 +0300) 2 commits
>  - Use is_pseudo_dir_name everywhere
>  - Allow cloning to an existing empty directory

As far as I understood from your message, you don't think that cloning
into empty directories is necessary. So, I thought, the best solution for
yesterday was "[PATCH] add is_dot_or_dotdot inline function" (to make you
happy ;)).

But the workarounds like this:

|    $ git clone -n $there it.git
|    $ mv it.git/.git . && rmdir it.git && git checkout -f

are painful, especially for newbies who have no idea about anything but
'git clone'.

> 
> There is an updated patch that only refactors the repeated code to check
> if a dirent is dot or dot-dot posted, which I should have picked up to
> replace these but I haven't yet (the "clone into empty" can and should
> build on top of it).
> 


Btw, I've sent some worthwhile patches, I but haven't got any reply from you:
	[PATCH] use || instead of | in logical expressions
	[PATCH] Replace deprecated dashed git commands in usage
	[PATCH] remove unnecessary 'if'
It's better if you say "No" than nothing.

^ permalink raw reply

* Re: [PATCH 3/6] Glean libexec path from argv[0] for git-upload-pack and git-receive-pack.
From: Steffen Prohaska @ 2009-01-11 12:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Johannes Sixt,
	Steve Haslam
In-Reply-To: <7v1vvata6o.fsf@gitster.siamese.dyndns.org>


On Jan 11, 2009, at 11:21 AM, Junio C Hamano wrote:

> My initial impression after reading 1/6 was that no matter how the  
> actual
> runtime prefix detection logic that is implemented in the later  
> parts of
> the series for particular platform will turn out to be, the update  
> to the
> Makefile that is done by 1/6 won't have to change.  If I apply 1/6  
> first
> without applying anything else, we can make sure that it would not  
> regress
> for Unix people (and catch regressions early if any), while Windows  
> people
> polish the platform specific parts of the implementation in the later
> parts of the series that can be replaced.
>
> Because changes to Makefile variables tend to have unexpected side  
> effects
> (people have their own definition to override them in their build
> procedures and you can easily break them unless you are careful), I  
> wanted
> to make sure the common part is solid before waiting for the other  
> part.
>
> But if you think it is better not to apply any one, until other parts
> mature, it is Ok by me, too.

I think you should apply it.

	Steffen

^ permalink raw reply

* [PATCH] bash completion: refactor diff options
From: Thomas Rast @ 2009-01-11 13:14 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce

diff, log and show all take the same diff options.  Refactor them from
__git_diff and __git_log into a variable, and complete them in
__git_show too.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---

This conflicts with the --patience series; I can send a version based
on that if it goes in first.


 contrib/completion/git-completion.bash |   38 ++++++++++++++++++-------------
 1 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 7b074d7..5017369 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -759,24 +759,29 @@ _git_describe ()
 	__gitcomp "$(__git_refs)"
 }
 
-_git_diff ()
-{
-	__git_has_doubledash && return
-
-	local cur="${COMP_WORDS[COMP_CWORD]}"
-	case "$cur" in
-	--*)
-		__gitcomp "--cached --stat --numstat --shortstat --summary
+__git_diff_common_options="--stat --numstat --shortstat --summary
 			--patch-with-stat --name-only --name-status --color
 			--no-color --color-words --no-renames --check
 			--full-index --binary --abbrev --diff-filter=
-			--find-copies-harder --pickaxe-all --pickaxe-regex
+			--find-copies-harder
 			--text --ignore-space-at-eol --ignore-space-change
 			--ignore-all-space --exit-code --quiet --ext-diff
 			--no-ext-diff
 			--no-prefix --src-prefix= --dst-prefix=
-			--base --ours --theirs
 			--inter-hunk-context=
+			--raw
+"
+
+_git_diff ()
+{
+	__git_has_doubledash && return
+
+	local cur="${COMP_WORDS[COMP_CWORD]}"
+	case "$cur" in
+	--*)
+		__gitcomp "--cached --pickaxe-all --pickaxe-regex
+			--base --ours --theirs
+			$__git_diff_common_options
 			"
 		return
 		;;
@@ -959,16 +964,15 @@ _git_log ()
 			--relative-date --date=
 			--author= --committer= --grep=
 			--all-match
-			--pretty= --name-status --name-only --raw
+			--pretty=
 			--not --all
 			--left-right --cherry-pick
 			--graph
-			--stat --numstat --shortstat
-			--decorate --diff-filter=
-			--color-words --walk-reflogs
+			--decorate
+			--walk-reflogs
 			--parents --children --full-history
 			--merge
-			--inter-hunk-context=
+			$__git_diff_common_options
 			"
 		return
 		;;
@@ -1473,7 +1477,9 @@ _git_show ()
 		return
 		;;
 	--*)
-		__gitcomp "--pretty="
+		__gitcomp "--pretty=
+			$__git_diff_common_options
+			"
 		return
 		;;
 	esac
-- 
tg: (7eb5bbd..) t/bash-complete-show (depends on: origin/master)

^ permalink raw reply related

* [PATCH/RFC v6 0/3] git checkout: optimise away lots of lstat() calls
From: Kjetil Barvik @ 2009-01-11 13:28 UTC (permalink / raw)
  To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
	Kjetil Barvik

Changes since version 5:

- After comments from René Scharfe, the usage of the lstat_stat()
  function inside entry.c has got it's own wrapper,
  has_dirs_only_path().

- All wrapper functions inside cache.h is moved inside symlink.c, and
  updated to be normal C extern functions (not inline functions).
  Thanks, René!

- The clear_lstat_cache() function is deleted, and instead we do an
  safeguard if-test to detect changes to track_flags or
  prefix_len_stat_func.

- Introduce a 'static inline reset_lstat_cache()' function, and make
  an struct of all the cache variables.

- The patch is split up into 3 parts again.

That's all changes this time!  Then some questions:

* Inside commit message for commit c40641b77b0274186fd1b327d5dc3246f814aaaf
* ("Optimize symlink/directory detection") Linus Torvalds writes
| [...]
| This can - and should - probably be extended upon so that we
| eventually never do a bare 'lstat()' on any path entries at *all*
| when checking the index, but always check the full path
| carefully. Right now we do not generally check the whole path for
| all our normal quick index revalidation.

  I am not quite sure what Linus is thinking of here, and in
  particular what an (eventually) extended interface should look like.

  I have been thinking of maybe introduce a "git_lstat()" like wrapper
  with the same arguments as the lstat() function, and then when
  possible, retrieve the contents of 'struct stat' from the cache.
  Would this sort of "git_lstat()" like wrapper be useful?

  I think it then could be used to simplify the first 7 lines of the
  check_removed() function inside diff-lib.c, to in some cases be able
  to do 1 instead of 2 calls to lstat(), and maybe similar things at
  other places?

  Should the wrapper return an error if it is able to discover an
  symlink somewhere in the path, like has_symlink_leading_path()
  already does today?

  But, unless someone strongly ask for this "git_lstat()" like wrapper
  to be included inside this patch series, I guess this should be
  future work.

  I hope that the now 3 patches is at least one step in the right
  direction!

| We should also make sure that we're careful about all the
| invalidation, ie when we remove a link and replace it by a directory
| we should invalidate the symlink cache if it matches (and vice versa
| for the directory cache).

  This is an argument for the clear_lstat_cache() function, I guess.

-----

I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.

After some more investigation and thinking, I have made 3 patches and
been able to optimise away over 40% of all lstat() calls in some cases
for the 'git checkout' command.  Also, if you use a large path to the
'--prefix' argument to the 'git checkout-index' command, and you have
lots of files, the savings can be really huge!

The 3 patches is against git master, and the git 'make test' test
suite still passes after each patch.  To document the improvement,
below is some numbers, which compares before and after the 3
patches. To reproduce the numbers:

- git clone the Linux git tree to be able to get the Linux tags
  'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25

Then, when the current branch is 'my-v2.6.25', do:

  strace -o strace_to27 -T git checkout -q my-v2.6.27

And then pretty print the 'strace_to27' file.  Below is the numbers
from the current git version (before the 3 patches).  Notice that we
do an lstat() call on the "arch" directory over 6000 times!

TOTAL      185151 100.000% OK:165544 NOT: 19607  11.136001 sec   60 usec/call
lstat64    120954  65.327% OK:107013 NOT: 13941   5.388727 sec   45 usec/call
  strings  120954 tot  30163 uniq   4.010 /uniq   5.388727 sec   45 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   2.740520 sec   45 usec/call
  dirs      45522 tot   1436 uniq  31.701 /uniq   1.994448 sec   44 usec/call
  errors    13941 tot   5189 uniq   2.687 /uniq   0.653759 sec   47 usec/call
             6297   5.206% OK:  6297 NOT:     0  "arch"
             4544   3.757% OK:  4544 NOT:     0  "drivers"
             1816   1.501% OK:  1816 NOT:     0  "arch/arm"
             1499   1.239% OK:  1499 NOT:     0  "include"
              912   0.754% OK:   912 NOT:     0  "arch/powerpc"
              764   0.632% OK:   764 NOT:     0  "fs"
              746   0.617% OK:   746 NOT:     0  "drivers/net"
              662   0.547% OK:   662 NOT:     0  "net"
              652   0.539% OK:   325 NOT:   327  "arch/sparc/include"
              636   0.526% OK:   636 NOT:     0  "drivers/media"
              606   0.501% OK:   606 NOT:     0  "include/linux"
              533   0.441% OK:   533 NOT:     0  "arch/sh"
              522   0.432% OK:   260 NOT:   262  "arch/powerpc/include"
              488   0.403% OK:   243 NOT:   245  "arch/sh/include"
              413   0.341% OK:   413 NOT:     0  "arch/sparc"
              390   0.322% OK:   390 NOT:     0  "arch/x86"
              383   0.317% OK:   383 NOT:     0  "Documentation"
              370   0.306% OK:   184 NOT:   186  "arch/ia64/include"
              366   0.303% OK:   366 NOT:     0  "drivers/media/video"
              348   0.288% OK:   173 NOT:   175  "arch/arm/include"

Here is the numbers after applying the 3 patches.  Notice how nice the
top 20 entries list now looks!

TOTAL      134155 100.000% OK:122102 NOT: 12053  11.069389 sec   83 usec/call
lstat64     69876  52.086% OK: 63491 NOT:  6385   3.410007 sec   49 usec/call
  strings   69876 tot  30163 uniq   2.317 /uniq   3.410007 sec   49 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   3.023238 sec   49 usec/call
  dirs       2000 tot   1436 uniq   1.393 /uniq   0.085953 sec   43 usec/call
  errors     6385 tot   5189 uniq   1.230 /uniq   0.300816 sec   47 usec/call
                4   0.006% OK:     4 NOT:     0  ".gitignore"
                4   0.006% OK:     4 NOT:     0  ".mailmap"
                4   0.006% OK:     4 NOT:     0  "CREDITS"
                4   0.006% OK:     4 NOT:     0  "Documentation/00-INDEX"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-block"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-firmware-acpi"
                4   0.006% OK:     4 NOT:     0  "Documentation/CodingStyle"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-API.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-mapping.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/Makefile"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/gadget.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-api.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-locking.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs-guide.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs_example.c"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/rapidio.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/s390-drivers.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/uio-howto.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/videobook.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/writing_usb_driver.tmpl"

Comments?

Kjetil Barvik (3):
  lstat_cache(): more cache effective symlink/directory detection
  lstat_cache(): introduce has_symlink_or_noent_leading_path() function
  lstat_cache(): introduce has_dirs_only_path() function

 cache.h        |    2 +
 entry.c        |   34 +++------
 symlinks.c     |  210 ++++++++++++++++++++++++++++++++++++++++++++-----------
 unpack-trees.c |    4 +-
 4 files changed, 183 insertions(+), 67 deletions(-)

^ permalink raw reply

* [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-11 13:29 UTC (permalink / raw)
  To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
	Kjetil Barvik
In-Reply-To: <1231680542-17315-1-git-send-email-barvik@broadpark.no>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 7311 bytes --]

Make the cache functionality more effective.  Previously when A/B/C/D
was in the cache and A/B/C/E/file.c was called for, there was no match
at all from the cache.  Now we use the fact that the paths "A", "A/B"
and "A/B/C" are already tested, and we only need to do an lstat() call
on "A/B/C/E".

We only cache/store the last path regardless of its type.  Since the
cache functionality is always used with alphabetically sorted names
(at least it seems so for me), there is no need to store both the last
symlink-leading path and the last real-directory path.  Note that if
the cache is not called with (mostly) alphabetically sorted names,
neither the old, nor this new one, would be very effective.

Previously, when symlink A/B/C/S was cached/stored in the symlink-
leading path, and A/B/C/file.c was called for, it was not easy to use
the fact that we already knew that the paths "A", "A/B" and "A/B/C"
are real directories.

Avoid copying the first path components of the name 2 zillion times
when we test new path components.  Since we always cache/store the
last path, we can copy each component as we test those directly into
the cache.  Previously we ended up doing a memcpy() for the full
path/name right before each lstat() call, and when updating the cache
for each time we have tested a new path component.

We also use less memory, that is, PATH_MAX bytes less memory on the
stack and PATH_MAX bytes less memory on the heap.

Thanks to Junio C Hamano, Linus Torvalds and René Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 5a5e781... 1061072... M	symlinks.c
 symlinks.c |  152 ++++++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 112 insertions(+), 40 deletions(-)

diff --git a/symlinks.c b/symlinks.c
index 5a5e781a15d7d9cb60797958433eca896b31ec85..1061072e9b18b12ca22c643bf9a3ea977eeb9916 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,64 +1,136 @@
 #include "cache.h"
 
-struct pathname {
-	int len;
+static struct cache_def {
 	char path[PATH_MAX];
-};
+	int len;
+	int flags;
+} cache;
 
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
+static inline int greatest_match_lstat_cache(int len, const char *name)
 {
-	int match_len = match->len;
-	return (len > match_len &&
-		name[match_len] == '/' &&
-		!memcmp(name, match->path, match_len)) ? match_len : 0;
+	int max_len, match_len = 0, i = 0;
+
+	max_len = len < cache.len ? len : cache.len;
+	while (i < max_len && name[i] == cache.path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
+	}
+	if (i == cache.len && len > cache.len && name[cache.len] == '/')
+		match_len = cache.len;
+	else if (i == len && len < cache.len && cache.path[len] == '/')
+		match_len = len;
+	else if (i == len && i == cache.len)
+		match_len = len;
+	return match_len;
 }
 
-static inline void set_pathname(int len, const char *name, struct pathname *match)
+static inline void reset_lstat_cache(void)
 {
-	if (len < PATH_MAX) {
-		match->len = len;
-		memcpy(match->path, name, len);
-		match->path[len] = 0;
-	}
+	cache.path[0] = '\0';
+	cache.len = 0;
+	cache.flags = 0;
 }
 
-int has_symlink_leading_path(int len, const char *name)
+#define FL_DIR      (1 << 0)
+#define FL_SYMLINK  (1 << 1)
+#define FL_LSTATERR (1 << 2)
+#define FL_ERR      (1 << 3)
+
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is can be indicated by the 'track_flags' argument.
+ */
+static int lstat_cache(int len, const char *name,
+		       int track_flags)
 {
-	static struct pathname link, nonlink;
-	char path[PATH_MAX];
+	int match_len, last_slash, last_slash_dir, max_len;
+	int match_flags, ret_flags, save_flags;
 	struct stat st;
-	char *sp;
-	int known_dir;
 
 	/*
-	 * See if the last known symlink cache matches.
+	 * Check to see if we have a match from the cache for the
+	 * symlink path type.
 	 */
-	if (match_pathname(len, name, &link))
-		return 1;
-
+	match_len = last_slash = greatest_match_lstat_cache(len, name);
+	match_flags = cache.flags & track_flags & FL_SYMLINK;
+	if (match_flags && match_len == cache.len)
+		return match_flags;
 	/*
-	 * Get rid of the last known directory part
+	 * If 'name' is a substring of the cache on a path component
+	 * basis, and a directory is cached, we can return
+	 * immediately.
 	 */
-	known_dir = match_pathname(len, name, &nonlink);
+	match_flags = cache.flags & track_flags & FL_DIR;
+	if (match_flags && match_len == len)
+		return match_flags;
 
-	while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
-		int thislen = sp - name ;
-		memcpy(path, name, thislen);
-		path[thislen] = 0;
+	/* Okay, no match from the cache so far, so now we have to
+	 * check the rest of the path components.
+	 */
+	ret_flags = FL_DIR;
+	last_slash_dir = last_slash;
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	while (match_len < max_len) {
+		do {
+			cache.path[match_len] = name[match_len];
+			match_len++;
+		} while (match_len < max_len && name[match_len] != '/');
+		if (match_len >= max_len)
+			break;
+		last_slash = match_len;
+		cache.path[last_slash] = '\0';
 
-		if (lstat(path, &st))
-			return 0;
-		if (S_ISDIR(st.st_mode)) {
-			set_pathname(thislen, path, &nonlink);
-			known_dir = thislen;
+		if (lstat(cache.path, &st)) {
+			ret_flags = FL_LSTATERR;
+		} else if (S_ISDIR(st.st_mode)) {
+			last_slash_dir = last_slash;
 			continue;
-		}
-		if (S_ISLNK(st.st_mode)) {
-			set_pathname(thislen, path, &link);
-			return 1;
+		} else if (S_ISLNK(st.st_mode)) {
+			ret_flags = FL_SYMLINK;
+		} else {
+			ret_flags = FL_ERR;
 		}
 		break;
 	}
-	return 0;
+
+	/* At the end update the cache.  Note that max 2 different
+	 * path types, FL_SYMLINK and FL_DIR, can be cached for the
+	 * moment!
+	 */
+	save_flags = ret_flags & track_flags & FL_SYMLINK;
+	if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
+		cache.path[last_slash] = '\0';
+		cache.len   = last_slash;
+		cache.flags = save_flags;
+	} else if (track_flags & FL_DIR &&
+		   last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
+		/* We have a separate test for the directory case,
+		 * since it could be that we have found a symlink and
+		 * the track_flags says that we can not cache this
+		 * fact, so the cache would then have been left empty
+		 * in this case.
+		 *
+		 * But, if we is allowed to track real directories, we
+		 * can still cache the path components before the last
+		 * one (the found symlink component).
+		 */
+		cache.path[last_slash_dir] = '\0';
+		cache.len   = last_slash_dir;
+		cache.flags = FL_DIR;
+	} else {
+		reset_lstat_cache();
+	}
+	return ret_flags;
+}
+
+/* Return non-zero if path 'name' has a leading symlink component.
+ */
+int has_symlink_leading_path(int len, const char *name)
+{
+	return lstat_cache(len, name,
+			   FL_SYMLINK|FL_DIR) &
+		FL_SYMLINK;
 }
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v6 3/3] lstat_cache(): introduce has_dirs_only_path() function
From: Kjetil Barvik @ 2009-01-11 13:29 UTC (permalink / raw)
  To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
	Kjetil Barvik
In-Reply-To: <1231680542-17315-1-git-send-email-barvik@broadpark.no>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 8518 bytes --]

Inside the create_directories() function inside entry.c we currently
do an stat() or lstat() call for each path component of the pathname
'path' each and every time called.  For the 'git checkout' command,
this function is called on each file for which we must do an update
(ce->ce_flags & CE_UPDATE), so we get lots and lots of calls...

To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions.  Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.

To support it we must update the lstat_cache() function to be able to
say that "pleace test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.

Thanks to Junio C Hamano, Linus Torvalds and René Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 11181aa... 7c8c8e4... M	cache.h
:100644 100644 aa2ee46... b0295bd... M	entry.c
:100644 100644 0004e97... c92fac7... M	symlinks.c
 cache.h    |    1 +
 entry.c    |   34 +++++++++++-----------------------
 symlinks.c |   59 ++++++++++++++++++++++++++++++++++++++++++++---------------
 3 files changed, 56 insertions(+), 38 deletions(-)

diff --git a/cache.h b/cache.h
index 11181aa0079ce94bfbdb2bba77205f49aa3cbcb3..7c8c8e484259a69bdc2c75e9e820247b0df04d01 100644
--- a/cache.h
+++ b/cache.h
@@ -721,6 +721,7 @@ struct checkout {
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 extern int has_symlink_leading_path(int len, const char *name);
 extern int has_symlink_or_noent_leading_path(int len, const char *name);
+extern int has_dirs_only_path(int len, const char *name, int prefix_len);
 
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..b0295bd5559d4a106d977da7a54b0cdd8399c208 100644
--- a/entry.c
+++ b/entry.c
@@ -8,35 +8,23 @@ static void create_directories(const char *path, const struct checkout *state)
 	const char *slash = path;
 
 	while ((slash = strchr(slash+1, '/')) != NULL) {
-		struct stat st;
-		int stat_status;
-
 		len = slash - path;
 		memcpy(buf, path, len);
 		buf[len] = 0;
 
-		if (len <= state->base_dir_len)
-			/*
-			 * checkout-index --prefix=<dir>; <dir> is
-			 * allowed to be a symlink to an existing
-			 * directory.
-			 */
-			stat_status = stat(buf, &st);
-		else
-			/*
-			 * if there currently is a symlink, we would
-			 * want to replace it with a real directory.
-			 */
-			stat_status = lstat(buf, &st);
-
-		if (!stat_status && S_ISDIR(st.st_mode))
+		/* For 'checkout-index --prefix=<dir>', <dir> is
+		 * allowed to be a symlink to an existing directory,
+		 * and we set 'state->base_dir_len' below, such that
+		 * we test the path components of the prefix with the
+		 * stat() function instead of the lstat() function.
+		 */
+		if (has_dirs_only_path(len, buf, state->base_dir_len))
 			continue; /* ok, it is already a directory. */
 
-		/*
-		 * We know stat_status == 0 means something exists
-		 * there and this mkdir would fail, but that is an
-		 * error codepath; we do not care, as we unlink and
-		 * mkdir again in such a case.
+		/* If this mkdir() would fail, it could be that there
+		 * is already a symlink or something else exists
+		 * there, therefore we then try to unlink it and try
+		 * one more time to create the directory.
 		 */
 		if (mkdir(buf, 0777)) {
 			if (errno == EEXIST && state->force &&
diff --git a/symlinks.c b/symlinks.c
index 0004e97d2547467564b146232f40cba4f5b04a3e..c92fac7506a702157e4b780d4331f3c7b412798d 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,10 +1,11 @@
 #include "cache.h"
 
 static struct cache_def {
-	char path[PATH_MAX];
+	char path[PATH_MAX + 1];
 	int len;
 	int flags;
 	int track_flags;
+	int prefix_len_stat_func;
 } cache;
 
 static inline int greatest_match_lstat_cache(int len, const char *name)
@@ -25,12 +26,13 @@ static inline int greatest_match_lstat_cache(int len, const char *name)
 	return match_len;
 }
 
-static inline void reset_lstat_cache(int track_flags)
+static inline void reset_lstat_cache(int track_flags, int prefix_len_stat_func)
 {
 	cache.path[0] = '\0';
 	cache.len = 0;
 	cache.flags = 0;
 	cache.track_flags = track_flags;
+	cache.prefix_len_stat_func = prefix_len_stat_func;
 }
 
 #define FL_DIR      (1 << 0)
@@ -38,28 +40,35 @@ static inline void reset_lstat_cache(int track_flags)
 #define FL_SYMLINK  (1 << 2)
 #define FL_LSTATERR (1 << 3)
 #define FL_ERR      (1 << 4)
+#define FL_FULLPATH (1 << 5)
 
 /*
  * Check if name 'name' of length 'len' has a symlink leading
  * component, or if the directory exists and is real, or not.
  *
  * To speed up the check, some information is allowed to be cached.
- * This is can be indicated by the 'track_flags' argument.
+ * This is can be indicated by the 'track_flags' argument, which also
+ * can be used to indicate that we should always check the full path.
+ *
+ * The 'prefix_len_stat_func' parameter can be used to set the length
+ * of the prefix, where the cache should use the stat() function
+ * instead of the lstat() function to test each path component.
  */
 static int lstat_cache(int len, const char *name,
-		       int track_flags)
+		       int track_flags, int prefix_len_stat_func)
 {
-	int match_len, last_slash, last_slash_dir, max_len;
+	int match_len, last_slash, last_slash_dir, max_len, ret;
 	int match_flags, ret_flags, save_flags;
 	struct stat st;
 
-	if (cache.track_flags != track_flags) {
+	if (cache.track_flags != track_flags ||
+	    cache.prefix_len_stat_func != prefix_len_stat_func) {
 		/*
-		 * As a safeguard we clear the cache if the value of
-		 * track_flags does not match with the last supplied
-		 * value.
+		 * As a safeguard we clear the cache if the values of
+		 * track_flags and/or prefix_len_stat_func does not
+		 * match with the last supplied values.
 		 */
-		reset_lstat_cache(track_flags);
+		reset_lstat_cache(track_flags, prefix_len_stat_func);
 		match_len = last_slash = 0;
 	} else {
 		/*
@@ -91,12 +100,17 @@ static int lstat_cache(int len, const char *name,
 			cache.path[match_len] = name[match_len];
 			match_len++;
 		} while (match_len < max_len && name[match_len] != '/');
-		if (match_len >= max_len)
+		if (match_len >= max_len && !(track_flags & FL_FULLPATH))
 			break;
 		last_slash = match_len;
 		cache.path[last_slash] = '\0';
 
-		if (lstat(cache.path, &st)) {
+		if (last_slash <= prefix_len_stat_func)
+			ret = stat(cache.path, &st);
+		else
+			ret = lstat(cache.path, &st);
+
+		if (ret) {
 			ret_flags = FL_LSTATERR;
 			if (errno == ENOENT)
 				ret_flags |= FL_NOENT;
@@ -136,17 +150,19 @@ static int lstat_cache(int len, const char *name,
 		cache.len   = last_slash_dir;
 		cache.flags = FL_DIR;
 	} else {
-		reset_lstat_cache(track_flags);
+		reset_lstat_cache(track_flags, prefix_len_stat_func);
 	}
 	return ret_flags;
 }
 
+#define USE_ONLY_LSTAT  0
+
 /* Return non-zero if path 'name' has a leading symlink component.
  */
 int has_symlink_leading_path(int len, const char *name)
 {
 	return lstat_cache(len, name,
-			   FL_SYMLINK|FL_DIR) &
+			   FL_SYMLINK|FL_DIR, USE_ONLY_LSTAT) &
 		FL_SYMLINK;
 }
 
@@ -156,6 +172,19 @@ int has_symlink_leading_path(int len, const char *name)
 int has_symlink_or_noent_leading_path(int len, const char *name)
 {
 	return lstat_cache(len, name,
-			   FL_SYMLINK|FL_NOENT|FL_DIR) &
+			   FL_SYMLINK|FL_NOENT|FL_DIR, USE_ONLY_LSTAT) &
 		(FL_SYMLINK|FL_NOENT);
 }
+
+/* Return non-zero if all path components of 'name' exists as a
+ * directory.  If prefix_len > 0, we will test with the stat()
+ * function instead of the lstat() function for a prefix length of
+ * 'prefix_len', thus we then allow for symlink(s) in the prefix part
+ * as long as those points to real existing directories.
+ */
+int has_dirs_only_path(int len, const char *name, int prefix_len)
+{
+	return lstat_cache(len, name,
+			   FL_DIR|FL_FULLPATH, prefix_len) &
+		FL_DIR;
+}
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v6 2/3] lstat_cache(): introduce has_symlink_or_noent_leading_path() function
From: Kjetil Barvik @ 2009-01-11 13:29 UTC (permalink / raw)
  To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
	Kjetil Barvik
In-Reply-To: <1231680542-17315-1-git-send-email-barvik@broadpark.no>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 7998 bytes --]

In some cases, especially inside the unpack-trees.c file, and inside
the verify_absent() function, we can avoid some unnecessary calls to
lstat(), if the lstat_cache() function can also can be told to keep
track of none existing directories.

So we update the lstat_cache() function to handle this fact new fact,
introduce a new wrapper function, and the result is that we save lots
of lstat() calls for a removed directory which previously contained
lots of files, when we call this new wrapper of lstat_cache() instead
of the old one.

We do similar changes inside the unlink_entry() function, since if we
can already say that the leading directory component of a pathname
does not exist, it is not necessary to try to remove a pathname below
it!

Thanks to Junio C Hamano, Linus Torvalds and René Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 231c06d... 11181aa... M	cache.h
:100644 100644 1061072... 0004e97... M	symlinks.c
:100644 100644 54f301d... a3fd383... M	unpack-trees.c
 cache.h        |    1 +
 symlinks.c     |   87 ++++++++++++++++++++++++++++++++++++--------------------
 unpack-trees.c |    4 +-
 3 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..11181aa0079ce94bfbdb2bba77205f49aa3cbcb3 100644
--- a/cache.h
+++ b/cache.h
@@ -720,6 +720,7 @@ struct checkout {
 
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 extern int has_symlink_leading_path(int len, const char *name);
+extern int has_symlink_or_noent_leading_path(int len, const char *name);
 
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
diff --git a/symlinks.c b/symlinks.c
index 1061072e9b18b12ca22c643bf9a3ea977eeb9916..0004e97d2547467564b146232f40cba4f5b04a3e 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -4,6 +4,7 @@ static struct cache_def {
 	char path[PATH_MAX];
 	int len;
 	int flags;
+	int track_flags;
 } cache;
 
 static inline int greatest_match_lstat_cache(int len, const char *name)
@@ -24,21 +25,23 @@ static inline int greatest_match_lstat_cache(int len, const char *name)
 	return match_len;
 }
 
-static inline void reset_lstat_cache(void)
+static inline void reset_lstat_cache(int track_flags)
 {
 	cache.path[0] = '\0';
 	cache.len = 0;
 	cache.flags = 0;
+	cache.track_flags = track_flags;
 }
 
 #define FL_DIR      (1 << 0)
-#define FL_SYMLINK  (1 << 1)
-#define FL_LSTATERR (1 << 2)
-#define FL_ERR      (1 << 3)
+#define FL_NOENT    (1 << 1)
+#define FL_SYMLINK  (1 << 2)
+#define FL_LSTATERR (1 << 3)
+#define FL_ERR      (1 << 4)
 
 /*
  * Check if name 'name' of length 'len' has a symlink leading
- * component, or if the directory exists and is real.
+ * component, or if the directory exists and is real, or not.
  *
  * To speed up the check, some information is allowed to be cached.
  * This is can be indicated by the 'track_flags' argument.
@@ -50,22 +53,32 @@ static int lstat_cache(int len, const char *name,
 	int match_flags, ret_flags, save_flags;
 	struct stat st;
 
-	/*
-	 * Check to see if we have a match from the cache for the
-	 * symlink path type.
-	 */
-	match_len = last_slash = greatest_match_lstat_cache(len, name);
-	match_flags = cache.flags & track_flags & FL_SYMLINK;
-	if (match_flags && match_len == cache.len)
-		return match_flags;
-	/*
-	 * If 'name' is a substring of the cache on a path component
-	 * basis, and a directory is cached, we can return
-	 * immediately.
-	 */
-	match_flags = cache.flags & track_flags & FL_DIR;
-	if (match_flags && match_len == len)
-		return match_flags;
+	if (cache.track_flags != track_flags) {
+		/*
+		 * As a safeguard we clear the cache if the value of
+		 * track_flags does not match with the last supplied
+		 * value.
+		 */
+		reset_lstat_cache(track_flags);
+		match_len = last_slash = 0;
+	} else {
+		/*
+		 * Check to see if we have a match from the cache for
+		 * the 2 "excluding" path types.
+		 */
+		match_len = last_slash = greatest_match_lstat_cache(len, name);
+		match_flags = cache.flags & track_flags & (FL_NOENT|FL_SYMLINK);
+		if (match_flags && match_len == cache.len)
+			return match_flags;
+		/*
+		 * If 'name' is a substring of the cache on a path
+		 * component basis, and a directory is cached, we
+		 * can return immediately.
+		 */
+		match_flags = cache.flags & track_flags & FL_DIR;
+		if (match_flags && match_len == len)
+			return match_flags;
+	}
 
 	/* Okay, no match from the cache so far, so now we have to
 	 * check the rest of the path components.
@@ -85,6 +98,8 @@ static int lstat_cache(int len, const char *name,
 
 		if (lstat(cache.path, &st)) {
 			ret_flags = FL_LSTATERR;
+			if (errno == ENOENT)
+				ret_flags |= FL_NOENT;
 		} else if (S_ISDIR(st.st_mode)) {
 			last_slash_dir = last_slash;
 			continue;
@@ -96,11 +111,11 @@ static int lstat_cache(int len, const char *name,
 		break;
 	}
 
-	/* At the end update the cache.  Note that max 2 different
-	 * path types, FL_SYMLINK and FL_DIR, can be cached for the
-	 * moment!
+	/* At the end update the cache.  Note that max 3 different
+	 * path types, FL_NOENT, FL_SYMLINK and FL_DIR, can be cached
+	 * for the moment!
 	 */
-	save_flags = ret_flags & track_flags & FL_SYMLINK;
+	save_flags = ret_flags & track_flags & (FL_NOENT|FL_SYMLINK);
 	if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
 		cache.path[last_slash] = '\0';
 		cache.len   = last_slash;
@@ -108,20 +123,20 @@ static int lstat_cache(int len, const char *name,
 	} else if (track_flags & FL_DIR &&
 		   last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
 		/* We have a separate test for the directory case,
-		 * since it could be that we have found a symlink and
-		 * the track_flags says that we can not cache this
-		 * fact, so the cache would then have been left empty
-		 * in this case.
+		 * since it could be that we have found a symlink or a
+		 * none existing directory and the track_flags says
+		 * that we can not cache this fact, so the cache would
+		 * then have been left empty in this case.
 		 *
 		 * But, if we is allowed to track real directories, we
 		 * can still cache the path components before the last
-		 * one (the found symlink component).
+		 * one (the found symlink or none existing component).
 		 */
 		cache.path[last_slash_dir] = '\0';
 		cache.len   = last_slash_dir;
 		cache.flags = FL_DIR;
 	} else {
-		reset_lstat_cache();
+		reset_lstat_cache(track_flags);
 	}
 	return ret_flags;
 }
@@ -134,3 +149,13 @@ int has_symlink_leading_path(int len, const char *name)
 			   FL_SYMLINK|FL_DIR) &
 		FL_SYMLINK;
 }
+
+/* Return non-zero if path 'name' has a leading symlink component or
+ * if some leading path component does not exists.
+ */
+int has_symlink_or_noent_leading_path(int len, const char *name)
+{
+	return lstat_cache(len, name,
+			   FL_SYMLINK|FL_NOENT|FL_DIR) &
+		(FL_SYMLINK|FL_NOENT);
+}
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..a3fd383afbe951fea8dbe4378cbe489657843c4a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,7 @@ static void unlink_entry(struct cache_entry *ce)
 	char *cp, *prev;
 	char *name = ce->name;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return;
 	if (unlink(name))
 		return;
@@ -584,7 +584,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	if (o->index_only || o->reset || !o->update)
 		return 0;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* Re: [PATCH] filter-branch: add git_commit_non_empty_tree and --prune-empty.
From: Johannes Schindelin @ 2009-01-11 13:35 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Jay Soffian, Junio C Hamano, git, pasky, srabbelier
In-Reply-To: <20090111111800.GA8032@artemis.corp>

Hi,

On Sun, 11 Jan 2009, Pierre Habouzit wrote:

> On Fri, Jan 09, 2009 at 07:29:15PM +0000, Jay Soffian wrote:
> > On Mon, Nov 3, 2008 at 10:18 AM, Pierre Habouzit <madcoder@debian.org> wrote:
> > > On Mon, Nov 03, 2008 at 09:27:29AM +0000, Pierre Habouzit wrote:
> > >> On Mon, Nov 03, 2008 at 04:58:44AM +0000, Junio C Hamano wrote:
> > 
> > Bump, http://thread.gmane.org/gmane.comp.version-control.git/99440/
> > 
> > (I'd like to see this included. Having a bunch of empty commits after
> > using filter-branch to remove unwanted files from history is, er,
> > sub-optimal, so seems like it might even be default behavior?)
> 
> Yeah I have that in my own git tree, and I meant to ask if something had 
> to be fixed for it to be accepted for some time, but always forget about 
> it.
> 
> Junio, do you think this could be accepted, or does it need some work ?

AFAICT Junio had some style issues which were not addressed.

And I suggested to merge the tests with Sverre's patch.  That suggestion 
also went unaddressed.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH v3 1/4] word diff: comments, preparations for regex customization
From: Johannes Schindelin @ 2009-01-11 13:41 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano, Teemu Likonen
In-Reply-To: <4aea85caafd38a058145c5769fe8a30ffdbd4d13.1231669012.git.trast@student.ethz.ch>

Hi,

On Sun, 11 Jan 2009, Thomas Rast wrote:

> This reorganizes the code for diff --color-words in a way that will be
> convenient for the next patch, without changing any of the semantics.
> The new variables are not used yet except for their default state.
> 
> We also add some comments on the workings of diff_words_show() and
> associated helper routines.
> 
> Signed-off-by: Thomas Rast <trast@student.ethz.ch>
> ---

It not only got bigger, it still fails to explain the idea to me.

I'll work on it myself, then,
Dscho

^ permalink raw reply

* Re: [PATCH] filter-branch: add git_commit_non_empty_tree and  --prune-empty.
From: Pierre Habouzit @ 2009-01-11 14:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jay Soffian, Junio C Hamano, git, pasky, srabbelier
In-Reply-To: <alpine.DEB.1.00.0901111433580.3586@pacific.mpi-cbg.de>

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

On Sun, Jan 11, 2009 at 01:35:15PM +0000, Johannes Schindelin wrote:
> Hi,
> 
> On Sun, 11 Jan 2009, Pierre Habouzit wrote:
> 
> > On Fri, Jan 09, 2009 at 07:29:15PM +0000, Jay Soffian wrote:
> > > On Mon, Nov 3, 2008 at 10:18 AM, Pierre Habouzit <madcoder@debian.org> wrote:
> > > > On Mon, Nov 03, 2008 at 09:27:29AM +0000, Pierre Habouzit wrote:
> > > >> On Mon, Nov 03, 2008 at 04:58:44AM +0000, Junio C Hamano wrote:
> > > 
> > > Bump, http://thread.gmane.org/gmane.comp.version-control.git/99440/
> > > 
> > > (I'd like to see this included. Having a bunch of empty commits after
> > > using filter-branch to remove unwanted files from history is, er,
> > > sub-optimal, so seems like it might even be default behavior?)
> > 
> > Yeah I have that in my own git tree, and I meant to ask if something had 
> > to be fixed for it to be accepted for some time, but always forget about 
> > it.
> > 
> > Junio, do you think this could be accepted, or does it need some work ?
> 
> AFAICT Junio had some style issues which were not addressed.

Huh, I did in a latter resend in that very thread.

> And I suggested to merge the tests with Sverre's patch.  That suggestion 
> also went unaddressed.

I can't find any mails from Sverre in the same thread, but maybe I'm not
searching in the proper place...

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox